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PREFACE. 


In a sequence of fundamental memoirs, G. Udny Yule, the 
eminent English statistician, has proposed certain methods of time 
series analysis which are of an essentially wider scope than the 
classical methods used in the search for periodicities. The basis 
of the new methods is a concept of flexible periodicity which in 
an ideal case reduces to the classical, functionally rigid periodicity. 
The importance and the broad applicability of the new ideas has 
been stressed particularly in subsequent discussion of the nature 
of business cycles. 

In the recent rapid development of the theory of probability, 
the production of A. Khintchine and A. Kolmogoroff represents 
a genuine discontinuity. A firm, axiomatic foundation has been 
obtained for the theory; other important results belong to the 
theory of random processes, i. e. hypothetical models for the 
analysis of time series. In accordance with the great diversity of 
time series, the main types of random process are of quite different 
structure. 

In the theory of probability, the approaches of G. U. Yule fall 
under the heading of the stationary random process as defined and 
studied by A. Khintchine. The present work might be described 
as a trial to subject the fertile methods of empirical analysis 
proposed by Yule to an examination and a development by the 
use of the mathematically strict tools supplied by the modern theory 
of probability. This statement, however, implies no valuation of 
the results and should be regarded rather as a tribute to my 
sources of inspiration and to the traditions of my milieu of study. 

My most sincere thanks are due to my teacher, Professor Habald 
Cramer. His brilliant courses, distinguished by a spirit of realism 
combined with penetrating logic, have laid the foundation for my 
further work. As far as the present thesis is concerned, this is 
true not only in general but also in respect to particular parts 
thereof, as indicated by the references to his 1933 course on Time 
Series Analysis. I wish to evidence my deep gratitude to Professor 
Cramer also for the encouragement and interest shown me at all 
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times, and culminating in Ms detailed reading of the first version 
of the manuscript. Our subsequent discussions have caused a revi- 
sion particularly of the treatment of questions of convergence in 
probability. 

To the Royal Swedish Academy of Sciences I want to express 
my respectful gratitude for a generous grant covering a substantial 
part of the expenses for printing and numerical calculation. 

I am greatly indebted to my friends and colleagues Mr. Gr. Eleving, 
Mr. W. Feller and Mr. 0. Lundberg for numerous stimulating 
discussions and for having read the manuscript and corrected many 
errors. I have also profited to a great extent by consultations with 
a large number of research workers in the different fields touched 
upon in the thesis. These obligations are, however, too comprehensive 
and indefinite to be expressed in detail. 


Stockholm, July 1938. 


H. W . 


PREFACE TO THE SECOND EDITION. 

Stationary processes having in the last 15 years been the subject 
of intensive research, important results have been obtained both 
regarding their theory and their many fruitful applications. In 
presenting a new edition of my thesis, the recent development is 
briefly dealt with in Appendices 1-2 (these replace Appendices A-B 
of the first edition, which were devoted to special topics), whereas 
the main text is left unaltered, except for a slight revision that 
makes use only of material available in 1938. I am greatly indebted 
to Dr. Whittle for writing Appendix 2, in which two main lines 
of progress are surveyed, viz. spectral theory and methods of statis- 
tical inference The short Appendix 1 comments, by way of num- 
bered foot-notes, a few specific points in the main text. 

The first edition had the dual form of an expository survey and 
a research report. It is hoped that the second edition may still 
serve as an introduction to the theory and the applications of 
stationary processes. 

Uppsala, March 1953. jr ^ 
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Introduction. 


1. Remarks on the scope of the study. 

Observational series which describe phenomena changing with time 
may be roughly classified in two broad categories, viz. evolutive and 
stationary. In the former case, different sections of the time series 
are dissimilar in one or more respects. For instance, the sectional 
averages may be distinctly different, or some other structural 
property of the series may present variation. In the analysis of 
evolutive time series, absolute time plays a fundamental role, e. g. 
as the independent variable in a trend function, or as a fixed scale 
in studying the development of a phenomenon from an initial state 
of rest. 

Stationary time series are unchanging in respect to their general 
structure. The fluctuations up and down in such a series may 
seem random or show tendencies to regularity — in any case, the 
character of the series is, on the whole, the same in different 
sections. Or otherwise expressed, in the analysis of stationary series 
time is allotted the secondary role of a passive medium. Even 
without preparation, observational time series are frequently sta- 
tionary. On the other hand, the deviations from a trend form a 
type of derived time series which is often stationary. 

Among stationary as well as evolutive series we may distinguish 
a great many different types, and in actual fact the dissimilarities 
are deepgoing. As a consequence, the problems which are relevant 
from the viewpoint of the applications are different for the various 
types of series. Now regarding stationary series, if we judge from 
the earlier literature on the subject, their analysis might seem equiva- 
lent to the search for periodicities. In the present volume, let this 
be said at once, time series analysis is taken in a much wider sense. 

Considering the classical methods of Fourier and Schuster, the 
hypothesis underlying these methods is that the time series under 
analysis might contain hidden periodicities, that is functional com- 
ponents which are periodic in the strict mathematical sense. It 
is well-known that these methods have often been applied with 

1 -535097. H . Wold . 



2 ANALYSIS OF STATIONARY TIME SERIES [Introd. 1 

definite success, and equally well-known that in many fields of 
scientific research they have met a severe criticism. An essential 
point in the criticism is that the idea of strict periods cannot 
possibly be realistic and adequate in certain applications. It has 
been claimed, for example, that in the theory of business cycles 
the hypothetical approach must be flexible to some extent, admit 
small changes in the periods and the amplitudes etc. A modified 
type of approach thus being called for, the difficulty is to find a 
precisely defined combination of the ideas of periodicity and of 
flexibility. It is evident that a strict hypothetical set-up as required 
can be reached only on the basis of the theory of probability. 

Though the above mentioned critical argument is old, it was not 
until rather recently that approaches have been suggested which 
allow for changes in the waves in the time series under analysis. 
There are two main lines of approach, both of them germinating 
from G. U. Yule. Let these be briefly outlined. 

Starting from a purely random series as given, for example, by 
dice-throwing, G. U. Yule ((1921), (1926)) forms the differences of a 
fixed order, and finds that the series thus obtained presents a tend- 
ency to regular fluctuations. E. Slutsky ((1927), (1937)) studies 
the effect of more general linear operations, and finds that under 
certain circumstances the resulting series will present sinusoidal 
waves with slowly changing amplitude and phase, waves showing 
a puzzling likeness to the cycles in economic time series. Nice 
examples of this are given by suitable moving averages of the 
primary random series. In the terminology of this study, the ap- 
proaches mentioned are special cases of the scheme of moving averages. 

The second type of approach is introduced by G. U. Yule (1927) 
in a study on sunspot numbers. Considering the sunspot index in 
a set of equidistant time points, Yule investigates the multiple 
correlation between these observations, and approximates by the 
use of linear regression analysis each observation by a linear 
function of the next preceding ones. The scheme thus implicitly 
defined will be called the scheme of linear autoregression . Using a 
physical interpretation, Yule gives a suggestive illustration of the 
new idea a pendulum subjected to a stream of random shocks 
will be ruled by a scheme of linear autoregression. To a certain 
extent, the movement of the pendulum will bear resemblance to a 
free swinging, but the random impulses will cause a continuous 
shift in amplitude and phase. 
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Using a comprehensive term, the two schemes mentioned will be 
called schemes of linear regression . It is seen to be a common 
feature of these schemes that a random element plays a funda- 
mental, active role. This constitutes a distinct contrast to the scheme 
of hidden periodicities — as we shall call the hypothesis of strict 
periods — and makes the schemes of linear regression a priori 
plausible in several instances where the scheme of hidden period- 
icities has been criticized. 1 

From the viewpoint of the theory of probability, the schemes of 
linear regression are special cases of the stationary random process 
as defined and studied by A. Khintchine ((1932) — (1934)). Let us 
discuss the situation in some detail. 

Considering a phenomenon as described by an observational time 
series, let us fix arbitrarily a finite set of time points, say (t) = 
(£ 1? t 2 , . ., t n X In a probabilistic theory of the phenomenon, we 
must necessarily assume that the behaviour of the time series in 
the n points considered is ruled by a definite probability distribution 
in n dimensions. Generally speaking, this distribution may be taken 
to be defined by a distribution function, say F(t tl . ., t n ] u u . ., u n ), 
where u ^ . ., u n are real variables. For instance, considering a 
set consisting of but one time point (£,), the hypothetical function 
Fit^ ttj) will indicate the probability that the observational value 
in t x is less than or equal to u v Having stated this, it is clear 
that we must assume certain relations of consistency between the 
functions F which belong to different sets (©; otherwise the hy- 
pothetical set-up might contradict itself. For instance, it is evident 
that F(t l ] u { ) must be assumed to satisfy all relations of the type 
F (£jJ / Uj i ) = F(,ti 1 t 2 ] “iq, co). 

We have seen that the probabilistic treatment of a time series 
requires a set of distribution functions, say {jF}, such that there is 
one function F corresponding to every finite set (t) of time points, 
and that the functions F satisfy certain consistency relations. On 
the other hand, such a hypothetical set-up will give a sufficient 
basis for a formal probabilistic analysis. Any set {.F} with pro- 
perties as mentioned is called a random process , and according to 
a fundamental theorem of A. Kolmogoroff (1933) such a set {jF} 
is equivalent to a probability distribution in an infinite number of 
dimensions. Of course, each time point corresponds to one dimen- 
sion in this distribution. 

In defining a random process, we may either choose our points 
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the case of continuous stationary processes will be touched upon 
only incidentally. 

As far as I know, A. Khintchine is alone in having dealt with 
the discrete stationary process in full generality; his chief result 
is that the stationary processes are ruled by a law of great numbers. 
The present study being concerned with other aspects of the sta- 
tionary process, we shall next give a few comments on the main 
lines followed. 

Chapter I serves a double purpose. In surveying the leading 
methods in the search for periodicities, particular attention is drawn 
to the hypotheses underlying these methods. By thus pointing out 
the rather narrow basis of the methods considered, the need for 
other types of hypothetical scheme is made clear, and the analysis 
of other types of approach prepared. On the other hand, after 
this rather detailed survey, the hypothesis of hidden periodicities 
will need no further separate treatment. 

Chapter II is reserved for a general analysis of the discrete 
stationary process. Sections 13 and 14 are preparatory, and show 
that in certain respects the stationary processes may be dealt with 
in the same manner as random variables in a finite number of 
dimensions. In particular, the singular case introduced in section 

14 corresponds to a multi dimensional probability distribution which 
is entirely concentrated in a plane or some other linear sub-space. 

The discrete stationary process being extremely general, sections 

15 and 16 give a few examples of processes obtained by different 
specializations. In this way we arrive at strict definitions of the 
processes of linear regression which cover the above mentioned 
approaches suggested by G. U. Yule. Further, the scheme of 
hidden periodicities is obtained by means of a singular stationary 
process. Detailed illustrations of the processes thus defined are 
given through model time series, i. e. series constructed in an arti- 
ficial way on the basis of random sampling numbers. Finally, in 
section 16, the normal stationary process is defined by a straight- 
forward generalization of the normal distribution in a finite number 
of dimensions. 

In the latter part of the second chapter, the structural properties 
of the general stationary process are studied. The field being wide 
and unexplored, the analysis has been restricted to the elementary 
features. Generally speaking, only such properties have been taken 
into consideration which could be studied by the use of linear 
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operations and autocorrelation coefficients. As the autocorrelation 
coefficients correspond to the mixed moments of second order in a 
set of ordinary, one-dimensional variables, the analysis will, in 
certain respects, be parallel to the familiar theory of multiple 
correlation. 

In section 17 is shown that the autocorrelation coefficients of a 
discrete stationary process may always be interpreted as the 
Fourier coefficients of a non-decreasing function. This theorem 
discloses clearly that it is only in special cases that a periodogram 
can tell us something relevant about a time series. 

The linear autoregression analysis as prepared in section 18 and 
developed in section 19 is based on the idea of subjecting the 
discrete stationary process to a treatment which is parallel to a 
time series analysis by means of the methods proposed by G. U. 
Yule (1927) and already referred to. The autoregression analysis 
thus corresponds to a linear regression analysis in a finite set of 
one-dimensional variables. The periodogram analysis, on the other 
hand, may be interpreted as a graduation by means of simple har- 
monics. If we consider the forecasts delivered by the two methods, 
the autoregression analysis reaches the limit beyond which we 
cannot proceed when employing only linear methods. 

Using the same tools as in section 19, the analysis of the structure 
of the discrete stationary process is, in section 20, carried further 
in a quite new direction. In spite of its wide comprehensiveness, 
the general discrete stationary process is found to be of a readily 
surveyable structure. In fact, the general process is built up by 
two components which may be interpreted as generalized processes 
of hidden periodicities and of linear regression respectively. 

In point of principle, the methods used in Chapter II are of a 
scope which would admit of generalizations of the analysis in 
different directions. Using non-linear operations, it would be possible 
to perform an autoregression analysis corresponding to curvilinear 
regression analysis in the case of a finite number of variables. 
Further, the analysis might be extended to the case when the time 
series is multi-dimensional, i. e. when several properties of the 
phenomenon observed are studied simultaneously. 

The stochastical difference equations studied in the first two 
sections of Chapter III form a generalization of the ordinary linear 
difference equations. While the solutions of the latter equations 
are ordinary functions, the solutions of the former equations are 
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discrete random processes. Considering an ordinary linear difference 
equation, its solution describes how a certain oscillatory mechanism 
will develop under given conditions. On the other hand, if a prim- 
ary stream of impulses is defined by means of a random process, 
the solutions of a corresponding stochastical difference equation 
give the probability laws which will rule the oscillatory mechanism 
when subjected to the primary impulses. Of course, if the actual 
development of the mechanism is known, the corresponding series 
of impulses is readily obtained. Moreover, among the solutions of 
stochastical difference equations we find both evolutive and sta- 
tionary random processes, e. g. the process of linear autoregression 
as strictly defined in section 15. 

The latter part of Chapter III is reserved for a detailed study 
of the processes of linear regression, a chief purpose being to 
illustrate the general analysis in Chapter II, and the theory of 
stochastical difference equations. Particular attention is paid to 
the forecast situation. In contradistinction to the processes of 
hidden periodicities, the processes of linear regression contain an 
active random element which affects the efficiency of the forecast. 
As a matter of fact, in a process of linear regression, the efficiency 
is the less, the longer the interval of time forecasted over. On the 
other hand, the short time forecasts are the more efficient. In view 
of the applications this circumstance is advantageous, for, of course, 
the main interest is always focussed upon the short time forecasts. 

The analysis in Chapter III, too, suggests certain generalizations. 
Thus, nothing prevents us from defining random processes by means 
of non-linear stochastical difference equations. Moreover, multi- 
dimensional random processes may be defined on the basis of systems 
of stochastical difference relations. A few remarks along the latter 
line are given in sections 31 and 32. 

Chapter IY, finally, gives a few applications of the theoretical 
analysis to observational time series of the stationary type. Such 
a series being given, its correlogram — my term for the auto- 
correlation periodogram — is adopted as an indicator of which 
type of process to apply. Explicit applications being given of the 
processes of linear autoregression and of moving averages, general 
methods are indicated for finding suitable numerical values of the 
parameters involved. For instance, assuming a given time series to 
be a moving average of an unknown primary series, suitable values 
for the weights of the hypothetical moving average are derived 
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from the correlogram of the given series. Further, a general method 
is given for deriving the primary series which corresponds to such 
a set of weights. 

The purpose of the applications is to illustrate certain general 
methods of analysis, not to supply a theory of the phenomena 
described by the time series examined. Accordingly, I make little 
account of the hypothetical schemes arrived at, and no attempts 
to test the significance of the parameters determined. On the 
contrary, warnings are repeatedly given for attaching importance 
to the numerical results of the analysis, for one reason because 
significance questions are extremely intricate in time series analysis. 
(It has turned out, however, that the results find support by the 
test methods that have been established after the 1st edition of this 
book was published. In Appendix 2 of this 2nd edition, P. Whittle 
has kindly undertaken to summarize the general test methods he 
has developed.) 


2. Principles of notation. 

The present volume being concerned with both theory and 
applications, symbols are needed for probabilistic as well as statistical 
concepts. Since the purpose of this section is to indicate the 
principles of notation, no completeness is aimed at with respect to 
the definitions of the elementary concepts considered. For these, 
reference is made to G. U. Yule and M. G. Kendall (1937) as to 
the theory of statistics, and to H. Cramer (1937) as to the theory 
of probabilities. 

Generally speaking, the notations try to bring into relief the 
correspondence between theoretical and empirical concepts. Thus, 
for parallel theoretical and empirical concepts the same symbol 
will be used — in the latter case marked by a bar. As a rule, 
Greek letters will be reserved for random variables, Roman letters 
for functions, ordinary variables, and constants. 

In agreement with these general principles, random variables as 
dealt with in statistics will be denoted by f, rj, etc. Considering 
such a random or statistical variable, say £, its observational values 
in a particular statistical population will be distinguished by run- 
ning indices, e. g. 

(i) 
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Irrespective of the order of the elements, an observational set 
(1) is uniquely determined by the corresponding function of cumul- 
ative relative frequencies, say Fin). Thus F{*u\ for every real «, 
equals the relative number of statistical units with an observational value 
£* ^ tt. The function F{t<) will be called the (empirical) distribution 
function of the observational set. By the use of the Stieltjks 
integral, the elementary characteristics of an empirical distribution 

(1) can be conveniently expressed in terms of F(u). The average 
of £ in the population considered is 

m = * • 5 h = f u • d j F(n). 
n i ,- 1 

The central moment of order k is denoted by fiu, and reads 

jiik • S (£*• — m) k ~ f (u — m) k • (I F(u). 
n i i " —bo 

Thus, /x a is the variance. Denoting the dispersion by />, we have 

( 2 ) /)•«/*,. 

In studying a statistical variable £, we introduce a corresponding 
hypothetical random variable £. Such a variable £, which is also 
calle<l stochastical or aleatory, is completely characterised by its 
distribution function Fiu\ By definition. F\u) imlicates the probab- 
ility that £ is less than or equal to u. This is expressed by the 
relation 

(3) Fin) - /'[£:,- M|, 

where P is the probability function of £. 

In the analysis of a variable £, it is actually only the observa- 
tional values and the variable £ as defined by the distribution 
function Fw\ which will appear in the formal developments. 
However, in the literature the observational values £< are also often 
supplied with hypothetical parallels, called sample values. For 
instance, the distribution of £ is spoken of as constituted by an 
infinite population of sample values £,. Such a terminology is often 
convenient, and is, incidentally, used also in the present study. As 
we need no notation for these sample values, the symbol £»• will, 
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as indicated later on, be reserved for representing independent 
variables in random sampling. 

Let g(x) be a function of a real variable, and £ a random variable 
with distribution function F(u). Under general conditions, g(jt*) 
represents a random variable with expectation given by 

(4) = I giu) • d F(m). 

— 00 

In particular, the elementary characteristics of £ may be interpreted 
in this way. For instance, the mean (m), dispersion (D), and the 
central moments (ji) are given by 

(5) m = I?® = J u • dft), D 2 (£) = ^ 2 . 

00 

(6) = F [(£ — m) k \ — J (u-m) k -dF («). 

00 

Any multi-dimensional random variable may be looked upon as a 
combination of one-dimensional variables. For such variables we 
shall use notations of type £ = [£ (1) , . . ., £ (A) ]. Interpreting in 

this manner, we can often use the same notations as in the one- 
dimensional case. When full information is required, e. g. concern- 
ing (3), we shall use notations such as 

Fiu^ m 2i . ., < tq, £ (2) < w 2 , . . ., £ (/l) < u h l 

In the same way, the expression (4) may be regarded as the expectation 
of a function #(£) of a multi-dimensional variable £, only that F [#(£)] 
must be interpreted as a vector in the space of <?(£), and that the 
integral must be extended over the space of £. For instance, let a 
one-dimensional random variable jp(£) be defined as a function of 
an fe-dimensional variable £ with distribution function F{u) = 
= F(u h . . ., Uh). Then the distribution function of #(£), say G p (x\ 
will equal, for an arbitrarily fixed x : the expectation of a function 
9x,p(u) = gx, P (u u . . ., Uh) defined by the relations <j x ,pM = 1 as 
p{u) < x, and g X)P {u) = 0 as p(u)> x. Thus, 

G p (x) = F[g XtP (§\ = $g XtP {u u . ., u h )'dF{u u . ., u h \ 

In particular, letting F(u^ . . ., Uh ) be the distribution function of 
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a variable £=[£; (1) , . £ (A) ], and G p (x) that of the sum j?© — 

= j p 1 +jp 2 ^ 2) + ■•• + Ph Q h \ where the pi s are real constants, 
we have 

(7) Gp(x) = J Ufr F(u u . w^), 

p{x) 

the integration being extended over a half-space p(x) defined by 
p (x) = [p x u t + • • • + j Oh Uh ^ x\ The present study being chiefly 
concerned with linear functions of random variables, this formula 
will frequently come into use. 

Considering a combined variable £ = with distribu- 

tion function F(u u . un\ and taking p* = 0 for Jc 4= and pi= 1, 
formula (7) gives the distribution function of the individual variable 
Denoting the resulting functions by Fi(u ), the variables are 
called independent if, for an arbitrary set (%, . ., Uh ), the following 
relation is satisfied, 

F(u u . ., W/i) === JPj (Wj) * JPg (w 2 ) . . . Fh (Wa). 

In the case of independent variables, the expectation satisfies a 
general relation involving arbitrary functions gt, viz. 

(8) E [g x (£W) ■ <? 2 ©«) . ... fffi m = E [ 9l ($'))] • E [g, ©«)] ....£[</* (g^)]. 

Putting 1 for all i, formula (7) gives the distribution func- 
tion, say G(u\ of the sum of our h variables If these are in- 

dependent, G(u) can be expressed by the use of the familiar coin- 
position ( convolution , Ealtung) symbol ■*, viz. 

(9) G (w) — j f \ (u) -x- F 2 (u) ^ Fh in). 

In the case of two independent variables, the convolution is given by 

G (u) = F x (u) *• I y1 a (ti) = J (« — x)’ d 1 g Car). 


A random sample containing h elements, and belonging to a random 
variable £ with distribution function 2 7 (w), may be defined as a 
sample value of an A-dimensional variable obtained by combining h 
independent variables & which have the same distribution function F(u). 
The concept of a random sample is thus purely theoretical, and 
corresponds to that of a statistical population. As already men- 
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tioned, we need no particular notation for the elements in a random 
sample. 

The ordinary correlation coefficient between two interdependent 
random variables, say | CO and £(£), will be denoted r = r (§(*'); £(£)). 

Let . . £(— 1), 5(0), g(l), |(2), ... be a sequence of interdepen- 

dent random variables such that the characteristics 


(10) m(g; 0 = A [§«>]; 0 = |(f + JO] 

will exist for all integral t and Jc. Let it further be assumed that 
the quantities defined by 


(ID 

will exist. 
( 12 ) 


»n(D= lim 4 ~ . • S »»(?; t)\ 

«- a n—n +1 (=»' 

n'-+— oo 


r® ® = li m 


1 




11IA , - — 

n — W H- 1 t=n' 

n '~* — oo 

Under these circumstances, the coefficients r k defined by 

v [k) — m 2 


r* = rk(§) s 


vL 0) — ra 8 


will be called the autocorrelation coefficients of the sequence £(£). 
It should be observed that this definition holds also in the special 
case when reduces to an ordinary function of t. 

Let (1) represent an empirical time series obtained by observations 
in the equidistant time points t= 1, 2, . . n. Following G. U. 
Yule (1926), the coefficients n defined by 


(13) 

where 


and 


A a 


fk = 


n — k _ _ 

2 5* • 5m- — (« — JO • M i • vh 

!=i 

(w — Jc) - D i - Dt 


1 n—k.. 1 n _ 

«*i .= 7 ‘ 2 5,-; w) 2 = t • 2 , 

n—k i n — k k + 1 


1 n — k _ 


n — k i 


2 (g f - D/ = - t • 2 (5,- - m 2 ) 8 , 

W — k Jt-Hl 


will be called seriaZ coefficients. These obviously form an empirical 
parallel to the autocorrelation coefficients. 



CHAPTER I. 

A surrey of hypotheses and methods proposed for 
the analysis of time series. 

3. Scope and disposition. 

This chapter aims at giving a historical perspective to the 
investigations in the subsequent chapters. Within the bounds of 
the survey fall the fundamental facts concerning the principal 
theoretical schemes set up for the study of stationary time series 
in one dimension. The leading methods for fitting the considered 
schemes to observational data will also be examined. For the sake 
of concreteness some descriptive methods will be touched upon 
incidentally. The survey being concerned with a general outline 
only, reference is given to H. Burkhakdt (1904) and K. Stumpff 
((1927), (1937)) for further material. 

The purely functional schemes will be dealt with first among 
the theoretical models for a given time series. At the opposite 
extreme is taken that purely probabilistic scheme in which the 
series is regarded as a random sample of an aleatory variable. The 
other schemes may be looked upon as intermediate cases. Of the 
mixed schemes, the approach of hidden periodicities is treated first. 
The series is here assumed to be additively composed of indepen- 
dent functional and random elements. The last section of the survey 
deals with the scheme of moving averages, studied in certain 
special cases by G. U. Yule (1921) and E. Slutsky (1927), and 
with the scheme proposed by G. U. Yule (1927) under the name of 
disturbed harmonics (cf. p. 2). 

The time points considered will be equidistant. The unit of 
equidistance will be taken for the time unit, a simplification which 
evidently does not involve any loss of generality. 
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4. Functional schemes. 


The functional schemes aim at a perfect functional representation 
of the empirical data. 

As a first example of functional approach, we take the hypothesis 
of a periodic function. Assuming the period to equal ft, and 
denoting the hypothetical function by x{t\ the following relation 
will be satisfied for any t 

(14) x{t) — x{t — ft) = 0 . 

As t is given for integral values only, it will be sufficient to 
consider the case of an integral ft . Then (14) forms an ordinary 
difference equation of order ft . According to the theory of difference 
equations, the solutions of (14) may be written 

(15) x{t) = m + 2 Ct cos + y* 

= m + 2 ( Ak cos —7 kt + 2?* sin kt 

k \ n h 



where k runs from 1 to (ft — l)/2 or ft/2 . The real parameters 


A*, J5 fc , <7* > 0, and <p fc in 



are connected by the relations 


(16) Ck — Ak + <pk — arc tg ( 1 3k/ Ak)- 


The approach now described will be termed the scheme of pei-iodzc 
functions. According to (15), the approach function may be con- 
sidered to be composed of superposed harmonics, each having an 
amplitude Ci, a phase <pk , and an angular frequency fa given by 



The expression (15) may be taken as the basis for various general- 
ized hypotheses. We start with the approach 

8 

(17) x{t) = m 4- 2 Ck cos {Ik t + (pk) = 

*=i 

8 

= m + 2 {Ak cos fa t + Bk sin fa 0 , 

*=i 
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which, will be called the scheme of superposed harmonics . Alternately, 
the function (17) will be called a composed harmonic . 

Denoting by j p* the periods of the individual harmonics in (15) 
or (17), we have 

(18) _£>*; = 2 tt/A* . 

In the scheme (15) the periods of the individual harmonics repre- 
senting x (t) are seen to be true fractions of h. In the generalized 
scheme (17) this restriction will not be laid down on the individual 
periods as given by (18). On the other hand, since t takes on 
integral values only, it follows that for any integer n the substitu- 
tion of A* — - -ra • 2 for A* will have no effect in (17). Neither will 
x(t) be affected by a simultaneous substitution of — A& for A*, 
and — (p k for gp*. Thus it would involve no loss of generality to 
assume that 0 < X k <, 7t. This means that, in point of principle, 
the analysis must be restricted to periods not less than 2 time 
units. A study of shorter periods requires a reduction of the time 
unit chosen as a basis for the analysis. However, unless explicitly 
stated otherwise, we shall merely assume that 0 < Aft. 

A function x(t) of type (17) belongs, as is well known, to the class 
of almost periodic functions in the sense of H. Bohr (see e. g. (1925)). 
The following property of an almost periodic function x(t) is re- 
corded for later use: An e>0 being arbitrarily given, there exists 
for every number t 0 > 0 an integer T (f, £q) > t 0 such that for 
every t (see H. Bohr (1925), p. 88) 

(19) \x(t 4- T) — x(t) | < e. 

A rough description of a scheme of superposed harmonics is 
yielded by its periodogram. This has the frequency A > 0 for 
horizontal axis, and indicates by ordinates in A* the corresponding 
squared amplitudes Cl. It is seen that the periodogram — which 
is met in several variants, e. g. with the periods pk = 2 tt/A* as 
abscissae or with the amplitudes C k as ordinates — does not pay 
any regard to the constant term or to the phases appearing in the 
expression (17). Another variant is the integrated periodogram. 
This is a function, say S' (A), defined by 

(20) S(A) = 2 Cl li Cl. 

I k=l 

Thus S(A) is a step (or saltus ) function which is proportionate to the 
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sum of squared amplitudes with frequency not greater than L An 
example of periodogram and corresponding integrated periodogram 
is given in the figure below. 

In the study of light, the periodogram has an experimental parallel in the 
spectrum. The prism spreads the light waves according to their frequency hk> and 
the individual lines in the spectrum indicate the energy of respective wave com- 
ponents. This energy is proportionate to the squared amplitude. The energy re- 
presented in an interval of the spectrum is thus proportionate to the sum of the 



Fig. 1. Ordinary periodogram ( C 2 , vertical lines), and corresponding integrated 
periodogram (S{X), horizontal lines). 

ordinates in the corresponding interval of the ordinary periodogram, and propor- 
tionate to the increase of the integrated periodogram in the same interval. 

Anatysis of white light produced spectra, where the lines were thin and lying 
very close together. This fact gave rise to the idea of continuous spectra and 
periodograms, in so far as the energy belonging to an interval was thought of as 
the integral of a spectral density. A survey and a development of the mathematical 
theory used in this connexion has been given by N. Wiener (1930). Translating to 
the terminology of the present study, this theory is based upon an analysis of the 
function 

1 3 

(21) Q GO = lim J (x (t 4* u) — m) • (x (0 — m) d t, 

2— - oo 2 Z — z 

where 

1 2 

m = lim — f x(t)dt. 

S~+ oo 2zJ s 

It is seen that if x(t ) is given by (17), Q(u) reduces to 
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Modifying by a constant factor the function S(X) defined by Wieneb, this reduces 
in the non-complex case to 


(23) 


TC 0 Q (0) U 


S(X) is called the integrated periodogram of x(t\ for in the case of superposed 
harmonics (17), SU) as given by (28) reduces to (20) (see e. g. H. C. Carslaw (1930), 
p. 322). Wiener shows that S(X) is always a non-decreasing function, and that there 
exist functions x{t) with continuous integrated periodograms (23). 

N. Wiener applies the generalized harmonic analysis also to functions defined 
by a random scheme. For instance, if x{t ) in integral intervals is 1 or — 1 with 
equ; * bability, and if the values taken on in different intervals are independent, 
8(X) is with probability 1 given by 


(24) 


2 *1 — cosw 
S(X)=— • f i — du. 

7t o « 


5. On applied harmonic analysis. 

Let an observational time series be represented by (1). If we 
wish to apply the scheme (17), the primary problem is to find 
for the parameters involved numerical values yielding as good fit 
as possible to the empirical data. 

In case the observational data are strictly periodic, say with 
period p, an application of (15) by means of the Fourier analysis 
will yield a perfect fit. It will be sufficient to consider the data 
lu . ., |p ranging over one period. However, since the formulae 
become particularly simple in case of an even period, and since we 
may take the double period for a basis if p is odd, let it be assumed 
that h=p=2q. The Fourier formulae for the 2 q parameters m, A q , 
Ah, Bk, where Tc = 1, 2, . ., q — 1, then read as follows (see e. g. 
H. S. Carslaw (1930), p. 325) 

m = - * S |f, A q = - • Sf* cos rtt, 

p i p *= i 

IP- 7T IP- re 

Ak = -* 2 & cos -it, • 2 §t sin -Jet. 

q t=i q q t= i Q 

In the approach (17), the essential problem is to evaluate the 
frequency numbers Ik- The principal method is that of A. Schuster 
((1898), (1900)) which is based upon the construction of an empirical 

2 - 535697. H . Wold . 


(25) 
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periodogram, say (? (A). The formulae required are (see e.g. K. 
Stumper (1927), p. 103) 

A(.X)=~- S(ii — m ) -cos It, OcACtt, 

n t=i 

B(X) = - * 2 (|« — m) • sin A£, 0 < A < tc\ 

, w z=i 

(27) c 2 a)=^ s a) + b*(X). 

For A = 7T, the factor 2 in A and B must be omitted. 

A graph of the curve G^CA) presents characteristic maxima, the 
abscissae A* of which are taken for the frequency numbers sought 
for. The corresponding parameters Ak = AUk) and Bk = B(fa) are 
obtained from (26). 

The periodogram method gives much valuable information about 
the series under investigation, but is rather inconvenient. However, 
a careful fit of (17) to observational data seems to necessitate 
tedious computations, so the labour seems due to the problem, not 
to the method. Nevertheless, many simplified and, accordingly, 
approximate methods have been proposed. One type of these is of 
interest for the sequel because it is based upon the differential or 
difference relations satisfied by a sum of harmonics. The first method 
of this kind, that of S. Oppenheim (1909), utilizes the fact that for 
arbitrary 0* and gp* the function xif) given by (17) satisfies a 
differential equation of 2s:th order, 

( 28 ) # (2,) (f) + g t ■ x {28 " 2) (0 4 - * • • 4 - g 8 -i • x {2) (0 - 1 - g 8 • [x{t) — m\ = 0 , 

with constant coefficients gi such that the equation 

(29) z 8 4- • z*— 1 4- • ■ • 4- g%— i • z 4- g 8 = 0 

has the roots — k*, — A* 8 , . . ., — AJ. Identifying f* with x(t\ and 
taking m = m, S. Oppenheim uses the identity (28) for $ successive 
lvalues, the gt being so far undetermined. These relations are 
considered a system determining the gi s. Inserting the resulting 
gi s in (29), the solving of this equation gives, finally, the frequency 
numbers A* desired. 

The derivatives required for the system of identities (28) S. Op- 
penheim obtains from the observational differences f* by the 


( 26 ) 
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well-known serial development (see e. g. E. T. Whittaker and 
G. Robinson (1926), p. 64). The intricate passage from differences 
to differentials is avoided in the modification of the Oppenheim 
method given by H. Bruns (1911), who starts from a certain identity 
between central differences satisfied by x(t\ viz . 

(30) z/ 2 s x(t - 5) + K • J* 8 ~ 2 x(t - s + 1) + ■ • • + h*-! • zf 2 x(t - 1) + 

+ h a [x if) — m] = 0 . 

Here the constant coefficients hi are such that the roots of the 
equation 

2 s + h t * z 6 * 8 ~ x H + ha— i • z + h s = 0 

are — q \ , where 

(31) q k = 2 sin A*/ 2. 

The fact that different functional schemes may give a good fit 
gives rise to the question of which scheme should be preferred 
when analysing a given time series. This is a particular aspect of 
the general test problem which is fundamental in all practical ap- 
plications. However, the most important aspects of the test prob- 
lem belong to the theory of probability, and will be touched upon 
later. As purely functional test methods may be regarded those 
which in principle consist in an extension — extrapolation or, some- 
times, interpolation — of the observational material, and a com- 
parison with the corresponding values of the functions fitted to the 
original data. 


6. On the linear difference equation. 

A class of functions of importance in the sequel, though not as 
a scheme for time series, is formed by the solutions of linear differ- 
ence equations with constant coefficients, 

(32) (x(t)—m) 4- a x • (x(t — 1 )— m) 4- ■ ■ ■ 4- an • {x(t — h)—m)= 0, cih=¥ 0. 

Writing (30) on the form (32), the resulting sequence at will be 
symmetrical. Now, since (30) is a special case of (32), the solu- 
tions of (32) embrace (17), and are well-known to be (see e. g. P. M. 
Marples (1932)) 
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(33) x(f) — m -f 2 Hml—i (,£) • p\ + 2 [JT^— i (f) • cos X k t + 

fc=l k--=l 

+ H c n k -i {t) • sin A* f] • ff* , 

where stands for a polynomial of order r. While the polyno- 
mials H are of arbitrary coefficients, their orders are the same in 
all solutions. In fact, the orders are determined by the characteristic 
equation of (32), viz. 

(34) s h + a x ■ a*" 1 + • • • + a h —i • a + = 

i 5 

— n (a — * n o* a + 2 • z + g*) 71 * = o, 

*= i *=i 


where the factors in the second member are real. 

The frequencies A* in (33) are connected with the characteristic 
equation by the relations 

(35) cos A* = — s k /q k . 

The asymptotical behaviour of x (© is dependent on the exponen- 
tial factors and the bases of which are likewise seen to be 
uniquely determined by (34). 

For later application it should be noticed that a necessary 

00 

and sufficient condition for the convergence of 2 (#(© — m) a and 

t = i 

00 

2 \x(f} — m|, for any values taken on by the arbitrary coefficients 

*=i 

in the polynomials H , is that |jp& | < 1 and |g*| < 1 for all k. An- 
other wording of the condition is that all roots of the character- 
istic equation (34) shall lie within the periphery of the unit circle. 
In such a case, x(t) will be referred to as describing a damped 
oscillation. 

A second property of (32) will also be used later. Let the arbi- 
trary coefficients in (33) be fixed under the single condition that 
no polynomial H vanishes; then, if some p k or some q k is different 
from unity in modulus, formula (33) shows that \x(t)\ cannot pos- 
sibly be uniformly bounded in (— oo<tf<oo). In the same way, 
any solution which does not belong to an equation of lower order 
is unbounded if (34) presents a multiple root. Thus, if a solution 
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xit) of (32) satisfies no linear equation of lower order and if \x(t)\ is 
uniformly bounded in ( — oo < t < <x>), then \pk | = | Qk\ = nik = nk— 1, 
i. e. the equation (32) is of the special type (30), and x{t) is of 
type (17). 


7. A purely probabilistic approach. 

A typical probabilistic hypothesis about a given time series is to 
regard the observational values as a random sample of a certain 
aleatory variable. The complete hypothetical set-up thus consists 
of a sequence of random variables, say . . ., rjit — 1), rjit), rjit + 1), 
. . . , which are mutually independent, and have identical distribu- 
tion functions, say Fix). 

Since the hypothesis under consideration consists of two elements, 
the test methods are of two kinds: a) those testing the goodness of 
fit of the hypothetical function Fix) to the empirical distribution 
function Fix) characterizing the observational series, and b) those 
testing the randomness in the observational series. 

A perfect fit to the data being in contrast to the idea of rand- 
omness, an amount of arbitrariness is in place in the choice of an 
hypothetical distribution function Fix) characterizing the aleatory 
variables rjit). 

The classical method for testing goodness of fit is the £ 2 -method 
of K. Pearson (see e. g. G. U. Yule and M. G. Kendall (1937), 
Chapter 22). Another test, viz. 

oo 

co 2 = J [Fiu) — Fiu )] 2 d u, 

— 00 

has been proposed by H. Cramer ((1927) p. 112, and (1928) p. 145), 
and, under the term of co 2 -method (Summenlinienverfahren), by R. 
v. Mises ((1930) p. 316). The latter method, an interesting modifica- 
tion of which has been given by N. Smirnoff (1936), does not suffer 
from the well-known arbitrariness implied in the ^-method. 2 

The hypothetical randomness lies in the independence relations of 
the type (8). Accordingly, the tests of randomness are tests ex- 
amining various particular instances of these relations. For example, 
since r* — 0 for h > 0, the serial coefficients f* must approximate 
zero for h > 0 (the particular case Jc = 1 is the Abbe-Helmert cri- 
terion — cf. K. Stumpfe (1927), p. 8). 

An important instance of the general test problem is concerned 
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with the choice between functional and probabilistic schemes. This 
problem is dealt with in the expectance theory (see c. g. K. Stumpff 
(1927), p. 115) founded by A. Schuster ((1898), (1900)) and which 
studies, i. a., the distribution of the periodogram ordinates C 2 (A) 
obtained when substituting a set of independent random variables 
rj(f) (cf. p. 11) for (1) in (26). Taking m = E[rj\ in (26), the basic 
formulae of the expectance theory read 

(36) E[AQi] =JE[B tt)] =0, 0 < l < n. 

(37) )]=-D 8 ®, 0<il<«. 

N n 

In case A = tt, the factor 4 in the latter formula must be omitted. 

Having now touched upon some typical functional and probabil- 
istic schemes set up for the analysis of time series, we are in a 
position to pass on to some intermediary schemes. Of these, two types 
may be distinguished which are of fundamentally different character. 
Since the terminology does not seem to be established, I propose 
for the two types the names » schemes of hidden periodicities » and 
» schemes of linear regression * . The former schemes are the earlier 
ones, and are dealt with in the next section. Some critical remarks 
on these schemes follow in section 9. The chapter concludes with 
some preliminary remarks on the schemes of linear regression. 


8. A scheme of hidden periodicities. 

A simple approach of hidden periodicities is to regard an observa- 
tional time series (1) as additively built up by a sum of harmonics 
and a random sample of a certain aleatory variable. Thus, denot- 
ing by 

(38) |(1), |(2), .... SM 

the hypothetical random variables corresponding to the observational 
set (1), we have in this case 

(39) §tf) = y(e + q(©, 

where y{t) is of type (17), and rj if) is a set of independent random 
variables as used in the previous section. 
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A chief problem when applying a scheme of hidden periodicities 
is to perform a separation of the functional and the random com- 
ponents. Since we are concerned with the case when y(t ) is a 
composed harmonic (17), the variate difference method, and other 
methods based on the assumption that y (f) reduces to a polynomial 
or to a trend function, fall outside of the program (cf. p. l). 

The principal method for the search of harmonic components in 
a given time series (1) is the Schuster periodogram method de- 
scribed in section 5. The test problem concerning the significance 
of the ordinates in the empirical periodogram was already mentioned 
in section 7. 

In respect to the Oppenheim-Bruns method for separating har- 
monic components (cf. section 5), it has been emphasized by J. I. 
Craig (1916) that the method fails when a random component is 
superposed on the harmonics. This disturbing effect of the random 
error will be called the » Craig effect ». As the Oppenheim-Bruns 
method is parallel to a method of importance in the sequel, and 
as the Craig effect — possibly because of the sketchy character 
of Mr. Craig’s paper — seems to have been overlooked in later 
literature, the point in question will be taken up in a separate 
discussion. This is done in section 28. 

The problem of separating the functional and probabilistic ele- 
ments being, of course, to a considerable extent indeterminate, even 
rough methods for the search of periodicities may be of interest. 
A simple method, which is of particular relevance because it is 
both convenient and capable of delivering more general periodic 
components than harmonic functions, is based on the well-known 
Buys-Ballot table (see e. g. K. Stumpff (1927), p. 100): 



Si, 



- Sp 


Ip+ll 

£p + 2) 


• i Sap 

(40) 




. . . 


§(*— l)p+l, 

^kp + 1 , 

Si*— i)p+a, 

Sfcp+2, 

■ * 7 


(41) 

m x , 

, 

• • , M ji — kp , • 

., w v 


Here p is an integer, h stands for the greatest integer which is 
less than nip, and fin is the arithmetical mean of the i : th col- 
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umn. Denoting' by h the number of elements in the £th column, 
we have 


(42) 


h = lc + 1 for i ^ n — Tcp , 
Tci — h for i > n — Jcp. 


The leading idea of the method simply is that if the series £* 
contains a component y(f) with period p, the values of y(t) for 
t = 1, 2, . . p are approximately given by the means m l9 m 2 , . . m p . 

While the arrangement (40) was used even before C. H. D. Buys- 
Ballot (1847), the method was developed in detail by B. Stewart 
and W. Dodgson (1879) and others (cf. H. Burkhardt (1904) p. 679 f.). 
A sharpening of the method by means of a periodogram construction, 
generalizing that of A. Schuster, is due to E. T. Whittaker 
(1911). In the Whittaker periodogram, p is taken for abscissa, 
and the ordinate in p equals the (weighted) variance in the series 
nit divided by the variance 7) 2 (f) in the series f *. 

The connexion between the two periodogram constructions is 
interesting. Considering the Schuster periodogram, is well- 

known to be approximated by the maximum value for varying A 
and B of the expression 

(43) D 2 (|) — - • E (fi — in — A • cos 1 1 — B • sin l tf. 

n t= i 


On the other hand, Professor H. Cramer, in his Course in 1933, 
showed that in the Whittaker periodogram the ordinate equals, 
apart from the constant denominator J) 2 (|), the maximum of an 
expression generalizing (43), viz. 

(44) D 2 © — - • £ [f t — m - y p (t)] 2 . 

n t= i 


This expression should be maximized under the condition that y v C t ) 
be a function of integral period p , but arbitrary for the rest.* 
With the notations used in (40), the maximum value of (44) 
1 p 

equals - • 2 Ja-(mi — m) 2 , which for the ordinates in the Whittaker 
n i = i 

periodogram gives 


* The interpretation of the generalized periodogram as a correlation ratio is in- 
correct (cf. E. T. Whittaker and Q. Robinson (1926), p. S45 f.). 
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(45) 2 la 0 m — m)V w • jD 2 (f). 

t=i 

In case n is a multiple of p, (45) reduces to 
(45 a) 2 (m — m) 2 /jp * D 2 (f) = D 2 (m*)/ jD 2 (|). 

t=i 

In his 1933 Course, Prof. H. Cramer also delivered an expect- 
ance theory of the Whittaker periodogram. He showed, i. a., 
that if the given series is a random sample of a normally distrib- 
uted variable, then (45 a) is distributed as a variance ratio in Fisher’s 
analysis of variance. 

It should be observed that as a principle the Schuster and the 
Whittaker periodograms are of equal validity — if the former 
discovers periods in a given observational series, then the latter 
will give positive results, and vice versa. 


9. On the criticism of the scheme of hidden periodicities. 

While the hypothesis of hidden periodicities has proved very fruitful 
in many fields of scientific research, many applications early met with 
a serious criticism (cf. H. Burkhardt (1904) p.685). The essential 
point of criticism bears upon the postulated strict periodicity of the 
individual functional components, and it has been maintained that this 
rigidity in the periodicity often has no empirical correspondence. 

The serial and autocorrelation coefficients disclose an interest- 
ing aspect of the above critical argument. If the functional ele- 
ment y(t) in (39) is a composed harmonic (17), the autocorrelation 
coefficients r* are for 1c ^ 0 given by 

(46) r k = 2 Cl • cos h Jc/{ 2 D\r}) 4- k Cl). 

z= 1 1 

The hypothesis of hidden periodicities assumes, therefore, that, for 
i=|=0, the autocorrelation coefficient n, too, is a function of the 
type (17), i. e, a composed harmonic. According to the relation (19), 
the hypothesis thus implies that there exist arbitrarily large A-values 
such that Tk is approximately given by the value taken by (46) for 
& = 0, viz. 
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(47) 2CV{2D*(r])+fCl)- 

This implication may, as a matter of fact, be used as a criterion 
of the applicability of the hypothesis of hidden periodicities. Thus, 
even though a given time series clearly shows a cyclical character, 
but the serial coefficients are gradually vanishing, then the scheme 
of hidden periodicities is no adequate approach. 

It seems plausible that in a good many oscillatory phenomena 
the serial coefficients actually are gradually vanishing. The above 
criterion shows that in such cases a periodogram analysis would 
give negative results. The table of serial coefficients in air pressure 
material from Port Darwin analyzed by Sir G. Walker ((1931), 
p. 528) may be referred to for illustration. The graph of serial 
coefficients presents damped oscillations of a period of about three 
years. 

It follows from the above that in a descriptive analysis of a 
time series the serial coefficients of G. U. Yule are of fundamental 
importance. J. Bartels (1935) has recently given another method 
of descriptive analysis, consisting in a generalization of the Buys- 
Ballot table, and directly constructed as a criterion of the applica- 
bility of the hypothesis of hidden periodicities. J. Bartels forms 
the Buys-Ballot table for successively extended sections of the 
given series. Let 6 [k, v) stand for the r:th section consisting of k 
consecutive rows in (40), and let q be the number of such sections. 
Let jD£ (£, v) be the variance of the column averages (A, v, p) in 
the section 8 (k } r), and let their arithmetical mean in respect of v 
be Jp (/?), 

J P (k) = \- 2D' p {k,v). 

q r»l 

For the expression 

(48) B p (k) = -k-J p {k)/j p ( 1), 

regarded as a function of k , J. Bartels proposes the term persist- 
ency characteristic because of the following observations. 

Let the series It under investigation contain a component of per- 
sistent period, i.e. a strictly periodic component. If the length of 
the period equals p, and if k is large, each of the q sets rhi{k , v,p) 7 
considering the variation with i , will nearly reproduce the periodic 
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component. Consequently, J p (A) will tend to a positive constant 
as A -► oc, and B p (k) will therefore increase nearly proportionately 
with A. — On the other hand, if the time series is purely random, 
B p (A) will show no tendency to vary with A, but will remain on the 
unity level. — Concerning intermediate cases, J. Bartels remarks 
that B p (A) may tend to an asymptote above unity. Then a quasi- 
persistency is present, a tendency of adjacent rows in the Buys- 
Ballot table to show a certain resemblance, a resemblance which 
will fade away as more distant rows are compared. 

The persistency characteristics computed by J. Bartels ((1935) 
p. 519 f.) suggest persistency in statistical data concerning a) the 
half year period in the international index of terrestrial magnetism, 
b) the 24 hour wave in air pressure in Potsdam , c) the 12 hour 
component of the Batavia temperature. On the other hand, quasi- 
persistency is suggested in a) the 27 day component in terrestrial 
magnetism, b) the 24 hour wave in the magnetic east component 
in Batavia. 

In economics, the classical periodogram analysis has repeatedly 
been tried on business cycle material. The negative results support 
an opinion which has been maintained also on logical-theoretical 
grounds, and which now seems predominant, viz. that the hypothesis 
of hidden periodicities is inadequate in business cycle theory. 

In cases like those mentioned above, where the approach of rigid 
periodicity fails, the schemes of linear regression announced in 
section 7 seem to form a natural and interesting substitute for the 
scheme of hidden periodicities. Reference to previous results con- 
cerning the schemes of linear regression being given later, when 
dealing systematically with these schemes, the concluding section in 
this survey will enter into detail only as to the earliest papers on 
the schemes in question. 


10. Remarks on the schemes of linear regression. 

In the hypothesis of hidden periodicities, there is assumed a 
fargoing interdependence between the elements of the given time 
series; leaving the random component out of the question, the 
interdependence is assumed to be purely functional. The schemes of 
linear regression assume as to adjacent elements an interdependence 
only in the sense of the theory of probability. 
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In an interesting study on the variate difference method, G. U. 
Yule (1921) considers the autocorrelation in a series consisting of 
iterated differences, say of order w, obtained from a purely random 
series. Since the autocorrelation coefficients are 


(49) 


— !)*• 


miyn — — fc + 1) 

(m+1) (m-h&) ’ 


0 <k<m, 


the series of differences must present an oscillatory character, a 
feature increasing in evidence with m. In other words, we are 
concerned with a primary sequence of random variables, say..., 
rjit — 1), rj(t\ rj(t+ 1), ..., which by hypothesis are independent, 
and have identical distribution functions; on this basis a secondary 
series, say ..., g(i— 1), £ (©, §(t+l), is defined by means of 
a moving linear operation of the type 

(50) £(© = 6 0 i?(f) + 6, !?(*-!)+ +M 

The approach thus defined will in the sequel be called the scheme 
of moving averages . 

Another particular case of moving average (50) is studied by 
E. Slutsky (1927), who forms the secondary, intercorrelated series 
from the primary, purely random series by n iterated summations 
by two, followed by the forming of w:th differences. Holding min 
constant, E. Slutsky shows that, with probability 1, an arbitrarily 
fixed section of the difference series will tend to a sine curve as 
w— > oo. This result is given as an application of a general theorem 
proved in the same paper and discussed in section 16. 

Stochastical interdependence of the type (50) is a particular case 
of linear regression. Letting the auxiliary variables rj (t) be the 
same, another type of linear regression is indicated by the following 
implicit definition of the intercorrelated variables £(©, 

(51) £(©+a 1 £«-l)+ +a*£«--W=i?(©. 

The approach (51) was introduced in an heuristic manner in an 
important memoir by G. U. Yule (1927). The fundamental differ- 
ence between the scheme of hidden periodicities and the scheme 
(51), which is said by Yule to define a » disturbed harmonic » §(©, is 
clearly brought out. In (39) the random elements r\ (ft »do not in 
any way disturb the steady course of the underlying periodic function 
or functions » (p. 268). On the other hand, regarding (51) Yule 
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(p. 294) states that a principal feature of a disturbed periodic move- 
ment is *a continual change of amplitude and shift of phase*. 

In the same paper G. U. Yule applies, with success, his hypothe- 
sis to empirical data, viz. A. Wolfer’s sunspot numbers. Further, 
he makes the following general statement concerning the scope of 
the scheme (51): » Disturbance will always arise if the value of the 
variable is affected by external circumstance and the oscillatory varia- 
tion with time is wholly or partly self-determined , owing to the value 
of the variable at any one time being a function of the immediately 
preceding values. Disturbance , as it seems to me , can only be excluded 
if either (1) the variable is quite unaffected by external circumstance , 
or (2) we are dealing with a forced vibration and the external circum- 
stances producing this forced vibration are themselves undisturbed* (p. 295). 

In order to attain conformity in terminology, the approach (51) 
will in the sequel be descriptively called the scheme of (linear) 
autoregression. G. U. Yule (1927) restricts himself to the cases 
h <4. General autoregression as implicitly defined by (51) was 
dealt with by Sir G. Walker (1931), whose applications to the 
air pressure data mentioned in section 9 gave positive results. 

As shown in detail by E. Slutsky (see e.g. (1937)), even a scheme 
of moving averages (50) may present waves of shifting amplitude 
and phase. Thus, both schemes of linear regression are of interest 
to the theory of those oscillatory phenomena for which the hypoth- 
esis of hidden periodicities proves inadequate. The investigations 
referred to in section 9 show that there are many central pheno- 
mena of this kind. 

While the schemes of linear regression thus form a type of 
hypothesis of the greatest importance, the development of the sub- 
ject is still little advanced, both as to the theory and the applica- 
tion of the schemes. For instance, earlier definitions concerning 
the scheme of autoregression are incomplete. One of the chief 
purposes of the present volume is to give some contributions for 
completion in these respects. It also aims at bringing the schemes 
into place in the theory of probability, thereby uniting the rather 
isolated results hitherto reached. 

In the theory of probability, the schemes of linear regression 
fall under the heading of the discrete stationary random process 
as defined by A. Khintchine (see (1932) and (1933)). As a matter 
of fact, the concept of stationary random process is extremely 
general, and the restrictions involved are only those indispensable 
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in. hypotheses concerning stationary phenomena (cf. p. 3). It is 
therefore but natural that the scheme of hidden periodicities, after 
a slight change in the interpretation, will also be found to form 
a special discrete stationary process (see section 15 13). Accordingly, 
the theoretical developments start with a chapter on the discrete 
random process. This analysis will, i.a., deepen the insight into 
the nature of the schemes of linear regression, which are dealt 
with in Chapter III. Chapter IV contains some applications of 
different hypotheses to observational time series. 

The present work is confined to the time-series aspects of the 
empirical data. Since the serial coefficients play a fundamental part 
in the analysis we note by way of a general caution that, as always 
when dealing with correlation coefficients, it is not a matter of 
pure routine to apply the general methods in practice, and the field 
is full of pitfalls. For one thing, there is the question of the quan- 
titative significance of correlation coefficients. Following up an 
argument presented in a preliminary note [see H. Wold (1936)], 
this question was taken up for discussion in Appendix A of the 
1st edition of this book. To restate the main conclusion, this is 
that for correlation coefficients which are obtained from a broad 
class of time or spatial series their quantitative significance is in- 
fluenced by the size of the statistical masses to which the data 
refer. In particular, this conclusion applies to a broad class of serial 
coefficients. 



CHAPTER II. 


On the theory of the discrete stationary random 

process. 

11. Definition of tlie stationary processes. 

In the theoretical analysis of a statistical time series, we may 
distinguish between functional and probabilistic approaches. In the 
former, the time series is represented by a function of time which 
in the general case is univalent, but otherwise unconditioned. In 
the latter, the most general approach is the unconditioned random 
process. From a purely mathematical viewpoint, the random process 
is a random variable in an infinite number of dimensions. Denoting 
by {£} the set of time points t in which the phenomenon changing 
with time is studied, each element in {£} corresponds to one di- 
mension in the random variable. 

Let {£} stand for a set of values taken on by a real parameter 

which will be spoken of as representing time, and let one random 

variable £(£) correspond to each time point t in {£}. Denoting the 
given set of random variables by {£(£)}, let it be assumed that the 
following conditions are satisfied. 

(A) . Choosing arbitrarily a sub-set in {£}, say (© = (f t> . t n \ the 
combined variable £0^,.., tn) = [? (t x ), . £(k)] will be well-defined. 

Let the distribution function of £(t ly . t n ) be denoted by F(t 1} . ., t n ; 
w lf . ., u n ) ) so that 

(52) Fitx, . tn] u t , . ., Un) = P[5(fi) ^ u ly f ( t 2 ) <■ Wg , . . ^ «n], 

and let the sets of distribution functions and probability functions 
of the variables . ., t n ) be denoted by { -Z 77 } and {P} respectively. 

(B) . Letting (0 = , .., t n ) be an arbitrary sub-set in {£}, and 

i n ) be an arbitrary permutation of the sequence (1, 2, , ., w), 
the functions {F} will satisfy the following relations identically in 

'Uiy . • , U n , 
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(53) F(ti„ ti„ Ui n ) = F(t u . U; 

(54) F(t u . t m ; M„ . Mm) == Fit !, . tn, U v . U m , + CO , . + CK>), 


where m < n. 

These relations express merely that the probability laws ruling 
the variables {£(£)} must not contradict themselves. Accordingly, 
these relations will be referred to as the consistency relations. 

Following A. Kolmogoboff ((1931), (1933)), a set {£(£)} satisfying 
the conditions (A) and (B) will be called a random process . 

According to a fundamental theorem of A. Kolmogoboff ((1933), 
p. 27), a set {F} belonging to a random process {£(£)} defines a 
probability distribution on those sets in a space Rt of an infinite 
number of dimensions {£}, which are formed by an enumerable sum of 
Borel’s cylinder sets in R t . For instance, if the sequence t , t— 1, 

# — 2, ... is contained in the set {£}, the probability P[£00 ^ u 0 , 
g(t-~ 1) < u v %(t ■— 2) ^ w 2 , . . .] will exist for any real sequence 
u 0 , a„ u 2 , ... 

We see that the stochastic process extends the notion of random 
variable from a finite to an infinite number of dimensions. The fre- 
quency interpretation in terms of a universe of sample elements 
remains the same. The sample elements of a process {£(£)}, also called 
realizations of the process, are functions of t , say £(©. Considering the 
universe of realizations, and keeping t fixed, say t = t x , we obtain the 
universe of sample values £i(tj) that constitute the random variable 
|(y. More generally, if we keep t^ . . ., tn fixed, the realizations 
will give us the universe of sample elements [&(f x ), .. &(?»)] that 

constitute the w-dimensional random variable [£(£ x ), . . f (f n )]. 

In order to define stationarity, we must consider arbitrary trans- 
lations within the set { t }. In doing this, we can assume that {t} 
either consists of all real ^-values, or is formed by an unbroken 
sequence of equidistant values, say . . — 1, 0, 1, 2, . . . In any case, 

a random process {£(£)} as defined by a set {JP} is termed stationary 
in the sense of A. Khintchine ((1932) — (1934)), if for an arbitrary 
sub-set (ft = W 1 , . t n ) in {tf} the relation 

(55) F^ + t, t^ + t, . . ., tn+t\ U u . U n ) = F{t u £g, . . t n \ Un) 

is identically satisfied in u u . u n and in t. Again following 
Khintchunte, the process will be called discrete , if t is restricted to 
a sequence of equidistant values, continuous if t is arbitrary. 
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According to the interpretation indicated in section 2, the variables 
£(© considered in the above definitions may be taken to be multi- 
dimensional. Thus, just as in the case of ordinary random variables, 
a ^-dimensional random process {£*(£)} may be looked upon as 
obtained by combining h one-dimensional processes, say {£ a) (ft}, . . ., 
{£ w it)}. Now, in studying simultaneously a group of one-dimensional 
processes, we shall always assume that an arbitrary finite sub-group, 
say {£^(©}, . . can be combined into a ^-dimensional 

random process. For instance, considering an infinite sequence 
{£ a> 00}i this assumption may be expressed as follows. 

Taking out arbitrarily a group {§&)(©}, . . {£W($}, fixing arbitrarily 

a set of time points t ly . ., f n , and a double real sequence u^\ where 
r = 1 , 2, . Jc\ $= 1 , 2, . ., n. we shall assume that the probability 

r=l, h’ 5=1,.., n] 

will exist; further, we shall assume that these probabilities will 
satisfy all consistency relations of type (53 — 54); in case of station- 
arity we shall also assume that all relations of type (55) will hold. 

Expectations derived from the distribution functions {F} determin- 
ing a stationary process {£(£)} will be called characteristics of the 
process. The characteristics are, of course, independent of t. Fur- 
ther, considering the distribution functions Fit; u) in the set {jF}, 
these will be independent of t. The function of u thus obtained 
will be termed the principal distribution function of the process 
considered. By definition, the mean (m), the dispersion (2)), etc., of 
a one-dimensional stationary process are given by the corresponding 
characteristics as obtained from the principal distribution function 
(cf. (5) and (6)). 

If the dispersion of a one-dimensional stationary process is finite, 
the automoments of second order as defined by (cf. (10) and (11)) 

= E[£(t )- |(f + m = J uv. du.vFit , t + Jc; u : v) = v^ k) 

Jit 

will be finite. The characteristics mentioned determine the auto- 
correlation coefficients belonging to the stationary process {£(£)} 
considered (cf. (12)), 

(56) r k = r*(£) = iv ( 2 k) — w 2 )/D a = r- k 

If r*(§) = 0 for all Tc 4= 0, the process {§(£)} will be termed non - 
autocorr elated. 


3 - 535697. H. Wold. 
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A. Khintchine (1932) gives also a more embracing definition of 
the stationary process, requiring only that the characteristics m, D 
and shall be independent of t . This case will be referred to as 
the generalized stationary process. 

Let {£(£)} be a stationary process, and consider the variables 
. ., t m ) and i*(£ t , . fe) which refer to the time points (^, . 
t m , . W- We shall sometimes have to regard the process {£(£)} in 
the set (£ m +i, • *, W as conditioned by the behaviour of {£00} in the 
time points (t v . ., t m \ Following a familiar terminology, we shall 
then speak of the variable |GU+i, * tn) as being conditioned by 
£(fu t m ). Indicating conditionality by an index (7, and denoting 
by Cf the condition obtained from C by replacing throughout U by 
U + f, it is evident that for an arbitrarily fixed t the two conditioned 
variables (t m + i, . ., W and ( t m+ x + f, . ., t n + t) will have identical 
distribution functions if the process {£(0} is stationary. The reader 
is referred to A. Kolmogoroff ((1933), Chapter V) for the funda- 
mentals concerning conditioned variables and distributions. 

Generally speaking, a characteristic of a conditioned variable 
depends on the conditioning variable, and will be called a conditioned 
characteristic. Since such a characteristic forms a function of a 
random variable, it constitutes in itself a random variable. For 
instance, considering a one-dimensional process {£(£)}, and taking as 
before %{t u . ., £m) to be the conditioning variable, the conditioned 
expectation of £(4i+i) is, by definition, the expectation of £c(ft»+i). 
Denoting this expectation by Ec [£ (£tn+i)]> a, general formula gives (see 
A. Kolmogoroff (1933), p. 47) 

(57) E[E c m m + !)]] = JET g «* + i)]« •»©. 

From now on , when not explicitly stated otherwise , the random 
processes dealt with are tacitly understood (A) to be one-dimensional, 
(B) to be discrete , and defined for integral time points , (C) to have a 
finite dispersion. Of course, the confinement to integral time points 
instead of a general equidistant sequence, say 

. . ., t Q a , t 0 + a -> t$ + 2 a, . . . 

involves no restriction as to the generality of the theory of the 
discrete process. 
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12. A theorem of A. KHINTCHINE. 

As far as I am aware, the only earlier investigation of the 
general discrete stationary process is that of A. Khintchine ((1932), 
(1933)) already referred to. Though the problems dealt with in the 
sequel lie along entirely different lines, the principal theorem of 
Khintchine will be quoted in full because it discloses a fundamental 
property of the stationary process. The theorem in question states 
that the stationary process is subjected to the law of great numbers, 
viz. in the following sense: 

Let £(£), g(t — 1), . . ., l;(t — n + 1) be a finite sequence of variables 
connected with a discrete stationary process {£(£)} with finite dispersion, 
1 n— 1 

and put 2 n = - • 2 £ (t — i). The dispersion of the difference 2 n — 2 m 
n i=o 

then tends to zero when n—* go and m—> oo. Further , processes may be 
constructed so that the asymptotical decrease is arbitrarily slow* 

In view of the applications — in particular certain questions 
concerning ergodic hypotheses — it is an interesting problem whether 
in the sums 2 the sequence i = 0, 1 , 2, . . . can be replaced by the 
sequence i 0 <i ± <i 2 < • • ■ A short reflection on the singular processes 
as defined and exemplified in sections 14 — 16 shows that such a 
general sequence is not allowed. 

The problems dealt with in the sequel have their points of 
connection with certain investigations on the continuous process 
and on special types of the discrete process. These earlier investiga- 
tions will be referred to in the course of the analysis. 

The concept of stationary process as introduced by A. Khintchine 
is extremely general. As a scheme for the analysis of time series 
it will be found to embrace all the schemes mentioned in the survey 
given in Chapter I. For the sake of concreteness it will be of 
interest, before passing to the general theoretical developments, to 
show in detail how these may be obtained by suitable specializa- 
tions. To this end a preparatory analysis of the general process 
will be useful. Accordingly, the next two sections will be reserved 
for some groundwork concerning operation with the discrete stationary 
processes. 

13. Some fundamental operations with random processes. 

In this section it will be shown that certain familiar operations 
with ordinary random variables can also be performed with random 
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processes. In discussing the situation, no restrictions will be laid 
on the processes considered. Generally speaking, the operations 
will give rise to new random processes. Moreover, if the processes 
dealt with are stationary, the resulting processes will be found to 
be stationary. It will be sufficient for our purpose to consider 
functions of random processes, and the forming of limit processes 
in convergent sequences. 

Denoting by § a random variable in a ^-dimensional space JR*, 
and by f[x] a function which is finite and BoREL-measurable in 
I?*, and whose values are lying in a space JR P , it is known that 
f [£] will be a well-defined random variable in R p (see e. g. H. 
Cramer (1937), p. 12 f). Let us next consider a combined variable 
[£ (1) , * • •> & n) ] consisting of random variables £^ in Rk. Forming* 
the variables /[£^], it is evident that these in the same way 
may be combined to a random variable [/ [{**)], . .,/[£ (n) ]] (cf. also 

p. 10). 

Thus prepared, let f[x] remain the same, and consider the 
variables £ (t u . ., tfn) = [£ (^), . ., £(W] which constitute a random pro- 
cess {£(£)}. According to the above, the variables [/ [§(^)], . . ., 
/[£(£»)]] will be well-defined. Denoting by {J^*} the set of distribu- 
tion functions of these variables, it is also evident that the func- 
tions F* satisfy all consistency relations of type (53 — 54). Further, 
if {£(0} is stationary, the functions F* also will satisfy (55). The 
variables [/[£(^)], . . .,/[£(£n)]] will thus constitute a random process, 
and if {£(0} is stationary, the process obtained will also be stationary. 
The resulting process will be said to be a function of the process 
{£(£)}, and be denoted by {/[£(£)]}. The variables of type [/[£(©], 
/[£(£— 1)], . . ., f[§(t — n)]] will be denoted by /[£(£, t— 1, . . ., < — »)]. 

In particular, considering a random process {£(£)} obtained by 
combining Tc one-dimensional processes, say {£ (1) ®},..., {£ w (0}, let 
us take f to be linear. The operation will then give rise to a sum 
process of type {a x £ a> (£) 4 • • • 4- a* £^ (£)}. Denoting this sum by 
{£*(©}, we shall write 

M) =a t +--■ + «*{£(« (6}. 

According to the above, we have 


(58) fttfi, . . , in) = + • •• + a k ^Kh\ . . a^Ktn) + - + **£<*>(«] = 

= «i £ (1> tti, .... « + - +a k p k) (t u t»). 
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Most of the functional operations dealt with in the sequel are of 
this simple kind. As to the processes {£ (1) ®}, ...» {g^C©} combined, 
these, too, are for the most part of a simple structure. We shall 
next present two type cases. 

Let {£ a) (©}, . . ., {gW(©} represent a set of random processes and 
let a set of time points it) ==■ , . . . , tn ) and h sets of real numbers 

(w( fi )) = (u^\ . . ., w^); s = 1, . ., A, be chosen arbitrarily. Then the 
processes {g^} will be called independent if the following relation 
is satisfied 

Ptgw^)^), §«%) <£*<*>, ..,gW(tt^tiW] = 

= PlpHtJ ^ uP, . ^ < 1} ] .... P[$ k %) ^ ^ } «n) < <*>]. 

Similarly, a sequence {g <l) (©}, {g (2) (0}, .... will be said to consist 
of independent processes if the processes in every finite subset 
{g (Zl) (0}, • . {g^(©} are independent. 

Now, let it be assumed that the independent processes {g^(©} 
are stationary, and have finite dispersions ZKg^). Then the sum 
process {£*(0} as defined by (58) will be stationary, and the expect- 
ation, the dispersion, and the autocorrelation coefficients of {£*00} 
will exist. We have P[£*] = a x P[g (1> ] + <z 2 E[? 2) ] + ••• + a k E[^l 
and, as is readily verified, 


(59) 


D i (£*) = a\ D 3 (£<“) + a\ D s (| (2 >) + • • • + a\ D i (£<«), 

, Z> 8 (g (1) ) , t <iK ■ , » -P 2 (g (t) ) . fm , 

^ ^p(£fc) di ^ pi^ ^ "t" dk jr^2 ^ P^> 


Of course, the two latter relations depend on the identities 

(60) r (£W it ± 2?); ^ it ±q)) = 0, ^ 0, > 0, 

where r and 5 are arbitrary. Having stated this, {gW} and {£(«)} 
will be termed non-coirelated or uncoiTelated if the relations (60) are 
satisfied. Evidently, the relations (59) hold under the broader as- 
sumption that any two processes {g (r) } and {g^} are non-correlated. 

In order to define the second type case, let us consider the vari- 
ables g(£ l5 . ., t n ) which constitute a random process {£(£)}. Choosing 
arbitrarily an integer 4, let us form a second type of variable, say 
£*(*!, . ., t n \ by taking g*(^, . t n ) = £ it t + Jc, . t n + Tc). Evidently, 
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these variables £* constitute a random process. Now, denoting this 
process by {£(£ + £)}, a short reflection shows that we may combine 
the processes {£(£)} and {£(£ + &)}. In fact, the distribution func- 
tions ruling the simultaneous behaviour of {£(6} and {£;(£ + &)} are 
uniquely determined by the distribution functions {J 1 } ruling the 
process {£(£)}, and it is further evident that the resulting distribu- 
tion functions will satisfy all consistency relations (53 — 54). More- 
over, if {£(£)} is stationary, the combined process will be stationary. 

The arguments being perfectly general, we can form the pro- 
cesses {£(£ — 1)}, . . ., {£(£— h)} and combine them into an (h+ 1)~ 
dimensional process. Now, if {£(£)} is one-dimensional, we can 
apply a linear operation of type (58) to the combined process. The 
result will be a process, say {£(©}, such that the corresponding 
variables £(fl and £(/ lt . . ., t n ) will satisfy relations like 

(61) C(0 = Ga£(t) + £{t — 1) + ••• 4 - — h) 

(62) . . ., f») = [^ 0 f flif (^i — 1) + ••• + ah£ {ti — h\ 

. . ., a 0 ^(t n ) + 1)+ + ciht; ( J n — h)]. 


If is stationary, then {C(fl} will also be stationary. 4 

The operations considered above may also be applied to observa- 
tional time series. Letting . . | w , |<, | t+lj . . . represent such a 
series, and transforming by means of a function f[x\ the resulting 
series will read . . ., /[&_ J, /[£*], f[%t+ 1 ], . . .. If, in particular, every 
& consists of a couple of k observations, say . . fW), and if f[x] 

is linear, the transformed series, say . . , g M| £ <} & +ll . . will have 
for general element 

S = a i “b a 2 + •••-+- ak f 

On the other hand, assuming to be one-dimensional, the trans- 
form (61) corresponds simply to a moving linear operation. In this 
case the general element in the transformed series reads 

^ a oh + 1<—: l + • • • + ah £i—h . 

In the theoretical developments, we shall frequently have to con- 
sider sums of type (58) when the number of terms tends to infinity. 
For use in such connexions, we need a suitable definition of con- 
vergence. To this end we shall next extend the concept of con- 
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vergence in probability, as introduced by F. P. Cantelli (1916), so 
as to apply to a sequence of random processes. 

A sequence £ (1) , £ (2) , ... of ordinary random variables is said to 
converge in probability to a random variable £ if for every s > 0 

P[\$ n) -£\>s] 

tends to zero as A necessary and sufficient condition for 

such convergence is that, for every e > 0, there exists a number n 
such that for an arbitrary q > 0 the following inequality holds 
(see A. Kolmogoroff (1933), p. 32). 

P[|g(n+*)_ gW | > € ]<e. 

Using the familiar interpretation of | x — y | as the distance between 
two points x and y in a multi-dimensional space, the definition of 
convergence also holds in case the variables are ^-dimensional, say 
§(*) = [£(/), . ?=[£i, . £*]. Of course, an equivalent definition 

is that for every e > 0 the probability 

P[|S! ,) -S.I<«, 61 <«] 

tends to unity as n— >oo. Now, considering the elementary ine- 
quality of G. Boole, 

(64) p[|§m-&i>«]^i-p[|h-»-! 1 i^«, .... 

< P[ 1 £<»> - £ x | >*] + ••• + P[| §<-> — g* i > »], 

it is evident that a necessary and sufficient condition that £(“> con- 
verges in probability to £ is that £<. n) converges in probability to £ r 
for r = 1, . k (see F. P. Cantelli (1916) and E. Slutsky (1925)). 
Thus prepared, let a sequence of random processes be denoted by 

(65) {£< s, (0},.... 

The sequence will be called convergent in probability to a limit pro- 
cess {£(©} if for an arbitrary set (t) = (t 1 , . ., t„) the sequence 

( 66 ) §*"(*!, ••>*»), 


be convergent in probability to the limit variable £ (t v . t„). 
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Theorem 1 . A necessary and sufficient condition that a sequence 
(65) of random processes he convergent in probability is that for an 
arbitrary t the sequence 

(67) £ a) («, 

be convergent in probability . If the sequence (65) is convergent , 
i/ every process {£ n Ht)} is stationary , the limit process will be sta- 
tionary. 


The necessity of the condition is implied in the above definition 
of convergence in probability. Next, let (f) = (t u . t n ) be arbitrarily 
fixed, and consider the sequence (66). According to the above 
application of the inequality of G. Boole, the convergence of (66) 
is implied in the convergence of (67) for every t in the set (t v . .,£»)• 
Having stated this, let the limit variable be denoted by §(£ 1? . W , 
and consider the limit variables belonging to all possible sets (0 = 
= (tfi, . t n ). Since the consistency relations of type (53 — 54) are 
satisfied for every process in the sequence (65), the same relations 
must be satisfied in the limit. The variables £{t u . t n ) will thus 
constitute a random process, say {£(£)}, and the same argument 
shows that the limit process {£(£)} is stationary if every process in 
the sequence (65) is stationary. 

The following corollary needs no comment. 

Corollary. Letting {£(£)} represent a random process, the sequence 
(65) converges in probability to {£(£)} if , and only if for an arbitrary 
t the sequence (67) converges in probability to £(£). 

In dealing with stationary sequences (65), the theorem proved 
above will be particularly useful, for the behaviour of (67) in re- 
spect of convergence will then be independent of t. Considering, 
in particular, the sum process {£*(£)} defined by (58), a necessary 
and sufficient condition for convergence in probability as 
00 

is that the sum S at {t) is convergent. Reference is made to A. 
1 

Kolmogoroff (1933) for the groundwork on infinite series of ran- 
dom variables. 

A sufficient condition of convergence is, of course, that the sum 

00 n+p 

2 at* E (£>] is convergent, and that the dispersion of 2 at £ (,) it) 

1 t=n 
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tends to zero uniformly inp as n — » oo. Moreover, writing m n and a n 

n 

for the mean and dispersion of 2 ar (£), it follows readily that 

i = l 

if these conditions are satisfied, it would imply a contradiction were 
not the mean and dispersion of the limit variable given by lim m n 

n— * oo 

and lim a n respectively. 

n— oo 

As a second application of theorem 1 we mate the following 
observation. Denoting by {£(£)} an arbitrary discrete stationary 
process, let a sequence (65) of processes be defined by 

{£(*)(«} =[..., + 1), 5®, 5(t - 1), . . 5« - 0, 0, 0, . . .]. 

Then for every fixed (0 = (t u . t n ) we have lim (f l9 . t n ) = 

2— OO 

= f (£ x , . ., £ n ). It follows that £W(£) converges in probability to 

15®}. 

Extending a current terminology, two processes, say {£ a> (0} and 
{5 (2) ®}> will be called equivalent , if for an arbitrary ®==®, . ., tn) 
the two variables . ., t,i) and £ (2) ®, . ., t n ) are equivalent, i. e. if 

WH t=5 (2) ®, U = 0. 

If two variables or processes are equivalent, we shall write 
5 (1) = 5 (2) , {5 (1) ®} = {5 <2) ®}, etc. 


14. On singular stationary processes. 

Let 5 = [£ <l) , • • , 5 (n) J represent an ^-dimensional random variable 
with distribution function F(u u . u n \ The distribution of § will 
be called linearly singular or, more briefly, singular , if there exists 
a linear function, say L[x — m] = a x 0 r (1) — m x ) + ■ • * + a n (x ^ — m n ), 
such that 

(68) P[L[$-m] * 0] = 

-P[a 1 (5 (l) -m x ) + + a» (5 (n) ~ ^n) + 0] = 0. 

If (68) is satisfied, the variables will be said to be connected 

by the relation L [£ — m] = 0. The singularity will be said to be 
of rank h , if there exist n — h , and only n — h , independent rela- 
tions between the variables £ (l) , say 
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' ax,ft+i • (| (t) *+i * £ (n) “ = 0 

(69) 

am - (g a) -m 1 ) + -- • + o»» * (| |n) - w«) = 0. 

It will be relevant to interpret singularity in terms of charac- 
teristic functions. Denoting the characteristic function of | by 
f(X u . X n ), we have by definition 

f(X j, X fl )=£[e 4 ^-‘ 5<1,+ - +;r »'« lBl ] = 

= j* ^f(Xi •«! + •• + x n • * n ) . #«)* 

Letting a matrix of real elements be given by 

#n > #12 > • •} #ln 

#21 » #22 > • #2n 

#nlj #n 2) • •» #nn 

and writing (Z) = (Z 1? . Z n ) for an auxiliary set of real variables, 
let the substitution 



(71) 


jX 1 = a 11 -Z , + ■ ■ 

• 4- ain • Z, 

{ X n =CLnl * Z A + • 

4” * Zj 


transform /(X lf . X n ) into Z n . ). Considering the variable 

£ = (£ (1) , £ (n) ) defined by 


(72) 



S* 1 ’ 

4- 

4 flni ' 



I (1> 

+ 

4* #7tn 

• gw, 


an elementary transformation shows that 

(73) f (Z 1} . , Zn) = E [e^ * * (1) + "+*»■ £ {n) )] = E [e* • * (l) + •• + *» ■ * (ri) >]. 


Thus /* is nothing else than the characteristic function of the com- 
posite variable £. Now, let £ be singular of rank h , say on account 
of the relations (69), and introduce for a moment the inconsequential 
assumption rm = 0. In such case /*(Z l5 . Z„) will contain at 
most the variables Z 1# . Z*. On the other hand, if the matrix (70) 
is non-singular the first h variables must all appear injCCZj, . ., Z n ). 
In fact, since the distribution of £ is uniquely determined by its 
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nu] 

characteristic function (see e. g. H. Cramer (1937) p. 104), the van- 
ishing of Z m in /* for a number m^h would imply = 0, a 
linear relation not obtainable from (69). 

It will be observed that formula (73) holds for arbitrary values 
of the real coefficients anc- In particular, the characteristic function 
of the h < n first variables as given by (72) will be arrived at 
by putting aik = 0 for Jc — h + 1, h -f 2, . . n. 

If £ is singular on account of a relation of type (68), and if 
E exists, we shall always assume that 

(74) nn=Em, 

which evidently will involve no loss of generality. Now, calculating 
E[£W] from (72), and paying regard to (74), we get 

’ = a n m x + ■ • ■ 4- animn = E [£ (1) ] 

(75) 

k m*n = ax n m x + • • • + a nn m n = E[^l 

Next, let it be assumed that all the variables have a finite 
dispersion. Then the expectation 

- m x ) + • ■ + X n ($ n) - mn)] 2 } = 

= f[X x (X x — m x ) + * • + XniXn — w«)] a • dF(x L , . ., X n ) 

will exist, and represent a non-negative quadratic form in the real 
variables X*, say Q(,X x , X 2 , . X n ). Writing 

tMk El(£ {i) — — m k )l 

we have fxtk — [tki, and 

(76) Q (X l5 X 2 , . . , X n ) = 2 2 m • X, X,. 

*=1 A ~1 

The quadratic form Q will give information concerning the singu- 
larity of the distribution of £. In fact, the rank of the quadratic 
form equals the rank of the distribution. In other words, by a suit- 
able substitution in Q of type (71) the number of variables may 
be brought down to the rank of £, but not further — and vice 
versa. Denoting the transformed forms by Q*, we have also 

Q* (Z u . . ,Z n ) = E {[Z, (£ C1) - rot) + ■ ■ ■ + (£<"> - roM*], 
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so the forms Q* thus are quite analogous to the characteristic func- 
tions f* in respect of linear substitutions. As the proofs, too, may 
be given a parallel wording, the verification of the statements made 
needs no comment. 6 

After these preliminaries, the next step will be to investigate as 
to singularity the variables £(£ x , . t n ) connected with a stationary 
process {I®}. Until further notice, we shall not assume that {f ®} 
has a finite dispersion. 

Suppose first that f (£, t— 1, . . t — ri) is singular on account of 

the relation 

(77) £(£) — m 4- a x - (£(£ — 1) — m) -I- • • • + au • (§(£ — h) ~m) = 0, 

where h<n. Then (a) the relation of singularity (77) must hold for 
every t , (b) if h < n, the variable f(£, t — 1, . t — h) must also pre- 
sent the same singularity. Further, if g(£, t — 1, . t — ft) satisfies 
a second linear relation of type (77), say with coefficients (a[, . ., aid, 
we obtain by subtraction a linear relation showing that there is a 
ti <h such that §(f, t — 1, . t — ti) is singular. Thus, when con- 
sidering a stationary process {£(£)}, a number h will be well-defined 
by terming {£(£)} singular of rank h if in the sequence 

g®, 1), £(£,£-l,£- 2), ..., ... 

the variable ?(£, t — 1, t — h ) is the first singular one. Taking 
(77) for the singularity relation of £(£,£ — 1, . — h\ it is readily 
seen that 

(A) the coefficients at are uniquely determined, and ah 4= 0. 

(B) for all k ^ 0 the {k + ft H- l)-dimensional variable £{t 0 + k , 
t 0 + k — 1, . ., t 0 — ti) is singular of rank h , and a system of k + 1 
independent relations is given by 

£ W — + a t * (£ (t 0 — 1) — m) + h ah - 

• (£(£<> - ft) - m) = 0 


£ (£ 0 + k — 1) — m + * (£ (£q 4- k — 2) — m) 4- • • • -I- an * 

• (£ tf 0 + * — ft — 1) — f») = 0 

£ (£ 0 + &) — m + • (£ (£ 0 + k — 1) — m) H V an • 

• (£ Wo + k — ft) — w?) = 0 


( 78 ) 
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Thus, if (77) holds the variable t n ) defined by (61), taking 

a 0 = 1, will reduce to [2?[f], . 2?[£)]. Further we note that the rela- 

tions of singularity (78) extend so as to give the identity 

(79) f(fl- *» + «!-(£ # — D- m)+ ••• + a h {£{t-h)-m) = Q 
with f = 0, ±1, ±2, . . . 

According to observation (B), a sample value 

“b “b ^ • •> A) == [§f (?o “b A), £i(t 0 + 4 — 1), . ., — i)] 

will, when regarding &(* 0 + 0 as a function of with probability 1 
satisfy the difference equation (32). Thus, if this difference equa- 
tion reduces to (30), any sample series 

[&(£o "b A), & (f 0 -f Jc — 1), . t-i(t 0 — h )] 

will with probability 1 be of type (17), i.e. consist of a number of 
superposed harmonics. This case will accordingly be referred to as 
a process of superposed harmonics. 

Resuming the assumption that the process considered has a finite 
dispersion, we are now in a position to state 

Theorem 2. Let {£(£)} he a discrete stationary process with auto- 
correlation coefficients r k . If {§(©} is linearly singular , {£(£)} is a 
process of superposed harmonics. A necessary and sufficient condition 
that {£ (£)} he linearly singular , say on account of the relation L [£(£)— m]=0 
given hy (77), is that n satisfies the difference equation L[r k ] = 0. 

Denoting by m and a the mean and dispersion of the process 
considered, the autocorrelation coefficients are given by (cf. (56)) 

(80) r k ==■ r— k = E [(§ (fl — m) (f it — l) — m)]/a 2 . 

Thus the quadratic form of type (76) belonging to £(f, t— 1, . t— n\ 
say Q n (Xt, Xt— i, Xt—n ), will be well-defined, and given by 

Qn (At, . ., Xt-n) = (T 3 * 2 Sr, „ , • Xt-P Xtr-q. 

P —0 0 

Since the form (J n is non-negative definite, its principal determinant 
will be non-negative (see e. g. G. Kowalewsbti (1909), Chapter 12), 
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1, 

*i> 

r%, ••• 

• J 



r i, 

1, 

n, ••• 

n ,- 1 

(81) 

4(r, «) = 

r*. 


l, ... 

., Vn-2 




Til — 1) f'n — 2) • • 

1 


The determinants 4 (r, n) defined above will be called the principal 
correlation determinants of the stationary process examined. 

Thns prepared, let us begin with, proving the second part of the 
theorem. In the first place, let {£($} be singular on account of 
(77), and multiply the left member of this relation by £(£ — Jfi/o*. 
Observing that the expectation of the resulting expression is zero, 
and paying regard to (80), we get 

(^2) L[r k ] = r* + a x r k -i H + r*_ A = 0. 


This relation shows that the condition is necessary. 

On the other hand, let the autocorrelation coefficients rt satisfy 
a linear difference equation L[r{\ — 0 of order h*. Transforming this 
equation to the form (32), and reducing to lowest possible order, 
say h, let Z t [rj = 0 be the result. According to the previous ana- 
lysis, £ (0 will then satisfy no linear difference relation of order < h. 
After this remark, let the consecutive rows of 4{r, h) be denoted 
ky Qo> Qi> ■ • ., Qh- Prom the structure of 4 (>*, h) it is evident that 
these rows are connected by the linear relation L k [Q h ] = 0. Thus 
4(r, h) equals zero, so the rank of 4 will be < h. Recalling from 
the theory of quadratic forms that the rank of 4(r, n) equals the 
rank of Q n , and keeping in mind the identity between the ranks 
of corresponding distributions and quadratic forms of type (76), it 
turns out that £(f, t— 1 , t — h) is linearly singular. Let the rela- 
tion of singularity be Z, [£(*) — m] = 0. Now, were not Z s . L lt 
en, eontranly to the assumption made, r t would satisfy a linear 
difference relation of order < h. Moreover, according to the con- 
stmction of L x we have ZW = Z*[L 1 M, where Z* is a well- 
defined linear operation. Thus Z [$(*) -ro] = Z* [Z x £«) - TO J] = 

f* LOJ-0, which proves that the condition is sufficient. 

. to f°I! tte first ? art of theorem 2, let (77) be the 
mkhon °f singularity, and let it be assumed that this has already 
been reduced to the lowest possible order. According to the above 
* en satlBf y (82)i but no linear equation of lower order. 
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Writing r t on the form (33), this implies that none of the polynomials 
H will be vanishing identically. Since, finally, the inequality |r*| ^ 1 
shows that r* is uniformly bounded in modulus in (— oo <&<<»), 
we conclude from the second remark in section 6 that r* can be 
written on the form (17). 

After this analysis, the following theorem will need no comment. 

Theorem 3. Let {£(£)} be a discrete stationary process with prin- 
cipal correlation determinants d (r, n) given by (81). A necessary and 
sufficient condition that {§(£)} be singular of rank h is that d(r 9 h) 
be the first vanishing determinant in the sequence dir, 1), dir, 2), . . . 

A relation of type (77) will be called a stochastical difference rela- 
tion of order h satisfied by the stationary process {£(£)}. The pre- 
vious analysis shows that a stationary process which has finite 
dispersion and satisfies (77) will satisfy a difference relation of type 

(83) d**£it - s) + h x £» - 5 + 1) + • ■ • + h 6 g(t) - tn] = 0. 

In the next two sections it will be shown, i. a., that the theorems 
of the present section are not vacuous, i. e. that there really exist 
stationary processes having the properties assumed by hypothesis. 


15. Some type cases of the discrete stationary process. 

As mentioned in section 12, it will be shown in the present sec- 
tion that the schemes surveyed in Chapter I may be regarded as 
special cases of the discrete stationary process. Some other schemes 
will also be presented, and a few characteristic properties of the 
different types considered be pointed out. Conditioned variables 
and expectations, and the operation of addition will be exemplified, 
and devices given for the construction of model series which follow 
the different schemes. For further concreteness some model series 
constructed for the illustration of later results will be furnished. 

a. The purely random process. This term will in the sequel be 
used for the purely random scheme touched upon in section 7. 
The purely random process will, of course, be obtained by taking, 
in the relation (52) defining the general process, 

Fit,, . ., tn ; U u . Wn)= Fiuf) F(u n ), 



48 AtfAI/TSIS OF STATIONABY TIME SEBIES [1115 

where any distribution function may be chosen for F(u). The veri- 
fication of (53) — (55) is obvious. 

When detailed information is required, a purely random process 
{£(£>} defined by a distribution function F(u) will be denoted by 
{§(#; F)}. It is seen that the defining function F(u) is identical 
with the principal distribution function of the process. 

The following simple theorem exemplifies the operation of addi- 
tion of independent processes. 

Theorem 4. Let {£ a> (£; JF a> )}, {£ (2) (tf; .F (2) )}, .. . represent indepen- 
dent , purely random processes such that the infinite convolution 

(84) F (1) * F (2) * . . . . 

is convergent . Then the sum F (1) )} + {£ (2) (£; JF (2) )} H will 

be convergent , and constitute a purely random process with the con- 
volution (84) for principal distribution function . 

In fact, the convergent convolution (84) is the distribution func- 

oc 

tion of the sum 2 |W (t; F^\ which is thus convergent. According 

to a remark attached to theorem 1, the convergence of this sum 

00 

implies the convergence of 2 F {i) )}. 

i=l 

A characteristic property of the purely random process is that 
the two variables £c(£i, . ., U and £(t u . f*) will have identical 
distribution functions for any condition (0) not referring to any time 
point in the set (t) = (t u . fn). Thus we have in this case (cf. (57)) 

^c[|(Wi)] ==-£[£(*)]. 

Any random series, e. g. a series of records on throws with a 
die, will form a model series of the purely random process. The 
illustrative model series used in the present study are very 
simply constructed, the double purpose being to facilitate the cal- 
culations, and to bring into relief the characteristic features of 
the different types of process. For this construction, the well- 
known random sampling numbers of L. H. C. Tippett (1927) were 
used. 

Two independent random series, denoted by (a« l) ) and (a* s) ), will be 
given for illustration. Denoting the corresponding processes by 
{a (1) (f; jF\)} and {« (2) tf; respectively, the defining function F ± is 
given by 
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F t (w) = 0 for u < — 1, 1\ («) = ‘1 for — 1 ^ u < 0, 

(w) = '9 for 0 ^ u < 1, (w) = 1 for 1 ^ u, 


and the function F% by 

F 2 (u) = 0 for u < — 1, F 2 (w) = 3 for — 1 < u < 0, 
Cu) = ‘7 for 0 ^ u < 1 , F 2 (u) = 1 for 1 < u. 


A short calculation shows that the mean value of each process 
{a^} equals zero, and that the variances, say D\ and D\ , are 

(85) Di=% 1)1 = -6. 

Writing x for the Tippett numbers, the model series (a< l) ) con- 
sists of the 1000 elements obtained from the first 1000 rnumbers 
on page 1 by the use of the following code: 

ctt 1 =1 for % = 0; a" = 0 for % = 1, . 8; a t l) = — 1 for t = 9. 

The second series was obtained from the corresponding T-numbers 
on page 2. The code used was 

a* 2) = 1 for t = 0, 1, 2; a*' = 0 for t = 3, ..,6; 

a?= -1 for r-7, 8, 9. 

The first 100 elements in each of the a-series will now be quoted. 


Tabic 1. (1) Model series (a"’); first 100 elements. 


0 • 

-1 

0 

0 

0 

0 

0 

0 

0 

-1 

-1 

0 

-1 

0 - 

1 

0 

0 

-1 

0 

-1 

0 

-1 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

-1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-1 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

0 

-1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

-1 

0 

-1 

0 

0 

0 

0 - 

1 

0 

1 

0 

0 

1 







(2) Model 

series 

(«;*’); first 

100 

elements 




1 

1 

0 

0 

1 ■ 

-1 

0 

-1 - 

-1 

0 

0 

-1 

0 

1 

1 

0 

0 

0 

-I 

0 

-1 

0 

-1 

0 

1 

0 

1 

-I 

0 

0 

-1 

-1 

0 

0 

0 

0 

0 

-1 

1 

1 

1 

-1 

1 

1 • 

-1 

1 

0 

0 

0 

-1 

-1 

-1 

1 

0 

1 

1 

-1 

-1 

1 

1 

1 

1 

1 

I 

0 

1 

0 

1 

1 

0 

1 

1 

0 

1 

1 

1 

-1 

-1 

-1 

-1 

0 

-I 

-1 

0 

0 

0 

1 

1 

1 

—1 

0 

0 

1 

0 - 

-1 

0 

-I 

-1 

1 

-1 


4- 535697. H. Wold. 
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The a-series will be used repeatedly in the present study. For this reason each 
of the series, considered a statistical population with hypothetical distribution func- 
tions and F 2 (u) respectively, has been tested as to the goodness of fit between 
the empirical and the theoretical distributions. Denoting the empirical distribution 
functions by F x (u) and F 2 (u) respectively, these were found to be 



fJ?i(?4) — 0 for u < — 1 , 

1 

J\(«) = 100 for — 1 < w < 0, 

(86) 


LF 1 (w) = *908 for 0 <w<l, 

F 1 {u)= 1 for 1 ^ m; 


pa(w) = 0 for u < — 1, 

I— 

F t («) — '321 for — 1 < « < 0, 

(87) \ 


LF 2 (m) — *692 for 0 < u < 1, 

F a (u) = 1 for 1 ^ 


The contest (see p. 21) indicates a nice fit. As is readily verified by the insertion 
of (86) and (87), the two a-series give the co 2 -values '000064 and *000505, while the 
corresponding expectations are '000180 and '000420. 

The model series (a) have also been tested with regard to the concordance between 
serial (r*) and autocorrelation (r*) coefficients. The latter vanish for k =4= 0. On the 
other hand, writing n for the number of elements in the series, and paying no regard 
to terms of order l/w, the sampling dispersion of any autocorrelation coefficient of 
the two series is found to be l/]/w = *032. The first five serial coefficients are 
given below. 



k = 1 

k=2 

CO 

II 

k= 4 

k = 5 

r*(ai l> ) 

057 

'047 

*010 

*015 

036 

n&t') 

*046 

*011 

'006 

-*004 

-'004 


The deviation from the corresponding autocorrelation coefficient is in no case 
larger than the double dispersion. It is rather interesting to note that although the 
series consist of as many as 1000 elements, a serial coefficient amounting to *06 can- 
not be considered significantly positive. 

The first 20 values of the series [a^ — * fi ( ? ) ) obtained from table 1 
are given below to illustrate a linear operation with independent 
random processes: 

( 88 ) -1 -2 0 0 -1 1 0 1 1 -1 -1 1 -1 -1 -2 00 -1 1 —1 

The series thus obtained is a model series of the process {a* 1 * — at 2> }. 
Since in the present case {ai^ + ap*} is found to possess the same 
defining distribution function as {a™ - a* 2) } , the series (88) also 
forms a model series for the process {a^ + <4 2> } . 

We shall next pass to some other type cases of the discrete 
stationary process, denoted by y and <J, which will be built up 
by successive linear operations on the purely random processes. 
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p. The process of moving averages. According to the analysis 
in section 13, a stationary process {£(*)} will be obtained by taking 

(89) £(t)=b 0 ■ t] (f) + &! • rj{t — 1) + • ■ • + b h ■ rjif—h), 

letting {ij «)} represent a purely random process, say {17 (f; F n (m))}, 
and (b) = (b 0 , b lt bh) an arbitrary sequence of real numbers. The 
type of process thus defined will be called the process of moving 
averages. A specific process (89) of this type will sometimes be 
denoted by {f (f; 17)}. The purely random process {17(f)} and the 
variables t] (t 1 } . t n ) will be called primary in respect of {£(f; 17)} 
and £(£ XJ .., t n ) respectively. 

Let it be observed that, for any constant c> 0, the variable (89) 
is identical with that defined by the variables rj ( t ; F v (c • u)) and by 
the sequence c • ®=(c • b 0 , c ■ b lt . ., c ■ h). Therefore, the assumption 
6 0 =1 often imposed on (89) in the following will not restrict the 
generality of the analysis. On the other hand, if J£& f =|=0, we can 
find an identical process such that S6, = l. Hence the name proposed 
for the process. 

The principal distribution functions Ffu) and F n {u) of {f(f)} and 
{17(f)} respectively are connected by the relation 

(90) F$ (u) — F 7j (u / b 0 ) X F n (m/6j) •x- • * F n (u/bh). 

In case Z> (17(f)) is finite, we obtain further 

V i (t(t)) = (bl + bl+-+bl)-J)*( v (t)). 

For exemplifying conditioned variables connected with a process 

{^(f/i7)} of moving averages, let m<h, and let C denote the con- 
dition 

(®=(T](t— m — l) = r] m+1 , r](t-m — 2) = T] m+2 , . . v (f — h) = rjh) . 
Then the conditioned variable £c (f) will be given by 

fc(f) = 6 0 -17(f) + ■ ■ + b m t) {t—m) + b m+ 1 • i7„, +1 + • • + b h ■ t] h ■ 

As is readily verified, we have in case D (17(f)) is finite, 

(91) I?c[£(f)] = (&o + b t + • • + b m )E[r) (f)] + b m+1 ■ r) m+1 + ■ ■ ■ + b h - t] h - 
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Model series for the process of moving averages are readily 
obtained by applying a moving linear operation to a model series 
for the purely random process. For instance, the series of differ- 
ences of any order obtained from a purely random series 

f)t will illustrate the process of moving averages. Below, as an 
illustrative model series are given the first 100 values of Ja { t = 
= a?* obtained from Table 1. The corresponding process 

will be denoted 

(92) = 1; F t ) - a (2) it; F,)} . 


Table 2. Model series {@t)\ first 100 elements. 


0 

-1 

0 

1 

-2 

1 

-1 

0 

1 

0 

-1 

1 

1 

0 

-1 

0 

0 

“1 

1 

-1 

1 

-1 

1 

1 

-1 

1 

-2 

1 

0 

-1 

0 

1 

0 

0 

0 

0 

-1 

2 

0 

0 

-2 

2 

0 

-2 

2 

-1 

0 

0 

-1 

0 

0 

2 

-1 

1 

0 

-2 

0 

2 

0 

0 

0 

0 

0 

-1 

1 

-1 

1 

0 

-1 

1 

0 

-1 

1 

0 

0 

-2 

0 

0 

0 

1 

-1 

0 

1 

0 

0 

1 

0 

0 

-2 

1 

0 

1 

— 1 

— 1 

1 

-1 

0 

2 

— 2 

2 


y. The general process of linear regression. Let {rj (0} stand for 

a purely random process with finite dispersion Dty), and let b 0 , 

00 

b u b 2 , . . . be a real sequence such that E b\ is convergent. Finally, 

A=0 

oo 

we must assume either that E[rj] = 0 or that E fo be convergent. 

Jfc=0 

These alternatives will present themselves repeatedly in the sequel. 
The former assumption is better suited to our purpose; the modi- 
fications caused by the latter are trivial. Accordingly, we shall 
always assume that E[rj] = 0. 

Considering the series 

(93) + 2) + ...., 

it follows from the independence of the variables rj (£) that the 
variance of 

(94) b n v] ( t — n) 4- 6 n +i 7](t — n — 1)4- • • ■ 4- 6 n +p tj ( t — n — p) 
is given by 

(bn 4- b n + 1 4- • • • 4 * b n +p) * 
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The dispersion of (94) thus tends to zero uniformly in p as w— >«. 
Accordingly, the sum (93) is convergent (cf. p. 41). As remarked in 
connexion with theorem 1, it follows that we may define a 
stationary process {£(£)} by writing 

(95) {£(«} = &o iy (0} + hi iy «— l)} 4 h {y « — 2)} + • • • ■ 

By definition, this is the general formula for a process of linear 
regression. 

Having dealt under article with a particular case of the gen- 
eral process of linear regression, we shall in the next article build 
up the scheme of autoregression as another type case. As a matter 
of fact, in the present study the general process will be chiefly 
used as a convenient tool for comprehension of the two type cases 
mentioned, and it is only these which will appear in the applica- 
tions. It will also be sufficient to refer to the type cases for 
model series. 

o. The process of linear autoregression. Let (a) = (a u . an) stand 
for real numbers such that an 7 ^ 0, and that the roots of the 
equation (34) all are of a modulus less than 1. Let further 
(6) = (fq, b 2 , . . .) be a sequence such that (A) the difference equation 

(96) x(t) + a x - x(t — 1) + ■ • ■ + ah' x(t — h) = 0 

is satisfied when x if) = b t , and (B) the initial values 6 X , . ., bn are 

solutions of the following system of linear equations 

a 1 + b 1 = 0 

a s 4- a x • b x -f" i 2 = 0 

( 97 ) 

an — 1 + ati — 2 * b x 4- • • * + • bh — 2 + bfi—i = 0. 

ah + an — 1 - b L 4- -h a 1 • bh— 1 + bh = 0 . 

Since the determinant of the system (97) is given by 

1 0 0 • ■ 0 0 

a x 1 0 • ■ ■ 0 0 

a 2 a x 10 0 

Cth — 1 CLh— 2 ’ * ■ * CLi 1 
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and thus equals 1, the initial J-values are uniquely determined. 

0° 

It is also seen that all hi are real. Moreover, the series 2 b 8 2 is 

o 

convergent (cf. section 6). Letting {rj (8} represent a purely random 
process with finite dispersion, the conditions indicated under article 
y thus will be satisfied. Hence, a stationary process {£(8} = {£(t;r])} 
will be defined by putting 

(98) £(*; q) = *?(© + 1)+ + 

By definition, this operation gives rise to the general process of 
(linear) autoregression . Since an 4= 0, the autoregression will be said 
to be of order h. 

As pointed out in section 13, the variables £ (8 defined by the 
following linear operation on the variables £ (t; rj) given by (98) 
will likewise constitute a stationary process: 

£(©=>£(« + a t • 1) + * ■■ + a h • 

It will now be shown that the process {£ (8} thus defined is equivalent 
with {17(8}. The proof is based on a transformation of a double sum 
of aleatory variables. 

By definition, we have 

h qo h N 

£(8 = 2 cti 2 bn • rj(t—i — k) = lim 2 a* 2 bk • rjit — i—k), 

1=0 Jfc=0 JV— *00 i=0 Jt *=0 

where a 0 and & 0 should be given the value 1. Introducing an 
auxiliary variable £jv( 8 , an elementary transformation shows that 

h N h— 1 p N h 

£A r ( 8 = 2 «i 2 bk7)(,t—i — &) = 2 2 b q + 2 17 (f— y ?)2 ftp—? 4 - 

2=0 Jt=0 p=0 q — 0 p=A 7—0 

N+h h 

+ 2 T)(.t—p ) 2 a Q bp—g —rj{t) + c x - 7](t — N— 1H I- Ca • r](t—N—K), 

p— .W+ 1 <7=p — AT 

where the second transformation is a consequence of (96) and (97). 
Putting = max | a* | , the coefficients c 8 introduced are seen to 
satisfy the inequality 

(99) | c 8 1 ^ a • ( | bN+s—h | 4- | btf+s+i—h | 4* • • • + | &jv r — 1 1 + | b^ | ) . 

Paying regard to the convergence of 2 we conclude without 
difficulty that c x rj(t — N— 1) +■••• + Cnrj^t — N—h) tends to zero 
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in probability as N — > <» . Thus, £jy(© tends to 7 )(t) in probability. 
According to the corollary of theorem 1, the proof that {£(£)} equals 
{r]{f)} is thereby completed, and we get the following fundamental 
identity, 

(100) {£©} + «!•{§«- 1)} + •{£«-»} «{i?(0}. 

The relation (100) says that we have, for every 2=0, ±1, ±2, . . 

(101) £(© + a, •!(*- 1) + 

Since by assumption ?y(2) is independent of 77 — 1 ), rj(t- 2 ),..., 
and — according to (98) — also of J(f — 1), £(2 — 2 ), . . the rela- 
tion (101) shows that the variables 5(0, £(t — 1), . g(t — h) are 
connected by a relation of linear regression. Hence the name 
proposed for the process. 

A simple illustration of conditioned variables is given by 

5c (0 = t] (0 — §f— 1 — <Zg 2 — * * • — C/i A , 

where (C)= (5(2 - 1) = &- 1, 5« - 2) = &-*, . . . , 5 tf - « = &-*). 

More general formulae will be given in section 23. 

Construction of model series for a process of autoregression, 
given for example by (98), may be performed in the same way as 
in the case of a finite moving average of a purely random series. 
The difficulty of an infinite number of weights is but apparent, 
for when a certain precision in the calculations is fixed, only a 
finite number of the weights, say II of them, will be found to have 
any influence. 

Denoting by St the values in a model series of the present type, 
and representing the primary model series by («*), the formula for 
the construction reads (cf. (63)) 

§t — &t 4 " t>i ’ at— 1 + 63 * &t—2 + • • * + bff • at— if . 

Having constructed d 2 , . . ., rf/j according to this formula, the 
subsequent values <J/f+i, Sji-t 2, etc. may be obtained from the more 
convenient recurrence formula 

( 102 ) S t = a t — a x • S t -\ — a 2 • S t -<> — • • • — a h • S t —h • 

In the illustrative series given below, a slight simplification has 
been made, in that formula (102) ha3 been applied also for 2=1, 
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2,.., S, taking & = 0 for t < 1. As the series consist of 1000 

elements each, this modification of the first few elements will not 

have any disturbing effect upon serial coefficients and other quanti- 
ties relating to the whole of the series. On the other hand, thanks 
to the modification adopted, the construction of the model series 
may be readily followed in detail. 

Three illustrative series, denoted by (d'J 1 ), (df) and (<5 ( t ”) respectively, 
will be presented. The formulae of type (100) for the corresponding 
processes read 

(103) {d (1) (0} = {c <s) ©} - -8 {<5 (1) (f - 1)} , 

(104) {d‘ s > (f)} = {a (1) (0} + -8 {d® (* - 1)} , 

(105) {d <8) (0} = (a (s) (£>} + ‘2 {<5 (S) (i - 1)} - ’65 {<P (t - 2)} . 

The verification of the recurrent calculation of the d-series needs 
no comment. 


Table 3. ( 1 ) Model series (d ( P). First 50 elements. 


1‘00 

*20 

- *16 

*13 

*90 

- 1*72 

1*37 

- 2*10 

*68 

— *54 

*44 

- 1*35 

1*08 

*14 

*90 

- *72 

*57 

- *46 

- *64 

*51 

- 1*41 

112 

- 1*90 

1*62 

- *22 

*17 

*86 

- 1*69 

1*35 

- 1*08 

— *13 

- *89 

*71 

- *57 

*46 

- *37 

*29 

- 1*23 

1*99 

- *69 

1*47 

- 2*18 

■ 2*74 

- 1*19 

- *04 

1*04 

- *83 

*66 

- *63 

- *58 




( 2 ) Model 

series 

m 

First 50 elements . 


*00 

- 1*00 

- *80 

- *64 

- *61 

— *41 

- *33 

- *26 

- *21 

- 1*17 

— 1*93 

- 1*56 

- 2*24 

- 1*79 

- 2*43 

- 1*95 

- 1*56 

- 2*26 

- 1*80 

- 2*44 

— 1*95 

- 2*56 

- 2*05 

- 1*64 

- 1*31 

- 1*05 

- *84 

*33 

*26 

*21 

*17 

*14 

*11 

*09 

*07 

*06 

- *96 

- *76 

- *61 

- *49 

— *39 

- *31 

- *26 

- *20 

- *16 

- *13 

- 1*10 

- *88 

- *71 

- *66 



( 3 ) Model 

series 

im 

First 50 elements . 


1*00 

1*20 

- *41 

- *86 

1*09 

- *22 

- *76 

- 1*01 

- *71 

*61 

*66 

- 1*22 

- *61 

1*67 

1*73 

- *74 

- 1*27 

*23 

- *13 

- *17 

— *95 

- *08 

- *40 

- *03 

1*25 

■27 

*24 

- 1*13 

- *38 

*66 

- *62 

- 1*56 

*09 

1*03 

*16 

- *64 

- *22 

- *63 

1*02 

1*61 

*66 

- 1*92 

*19 

2*28 

- *67 

- *62 

*31 

*47 

- *11 

- 1*32 
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iz. On the periodic processes . In this article, a stationary process 
will be constructed which belongs to the class of singular processes 
as introduced in section 14. A distribution function F(u) and an 
integer h being arbitrarily given, there will be constructed a sta- 
tionary process, say {£(£)}, with the following properties: (a) the 
process will be singular by means of the relation 

(106) £(©-£(* — ft)== 0, 

(b) the process has F (w) for principal distribution function. Accord- 
ing to the construction, any sample series, say (if*, g*+i, . . .), will be 
strictly periodic, and with period h. The process will, accordingly, 
be termed a periodic process. 

The simple construction device reads as follows. Let £ = [£ (1) , . £ (A) ] 
represent an ft-dimensional aleatory variable such that (A) the distribu- 
tion function of §, say F(u x , . ., u n \ is symmetrical in respect of 
the variables Ui , (B) all the variables g (i) have F(u) for distribution 
function. Now, let a sequence of multi-dimensional aleatory variables 
be defined by 

m ts®, n, , 

S®, g®, i®]...., ig®,.., g®, g®,.., g®l, 

tg®, ■ , g«, g®, ■ , 0 <A> ) g®], 


It is evident that these variables may be taken for the variables 

S(6, KU-1), , £(*,*- 1 , .... * — * + 1), 

g (£, . £ — ft + 1, £ — h), g (£, . ., + 1, £ — ft, £ -— ft — 1), . . ., 

g (£, . . ., t - h + 1, t — ft, . ., * — 2 ft + 1), 

g(£, . ., £ — ft + 1, t —■ ft, . ., t — 2 h + 1, t — 2 ft), 


connected with a stationary process {g(0}. It needs no comment 
that this process has the advanced properties (a) and (b). 

For exemplifying conditioned probability distributions connected 
with the periodic process constructed above, let C stand for a 
condition implying g (; t ) — g*, and let n represent an integer. Then 
Pc[£(t + n * h) cr 6] = 1 if g £ belongs to the set 6, otherwise Pc = 0. 
Further, Fcl£ (£ + n • ft)] = g*. 

For the construction of a model series, say ft u . . . ., illustrating 
the periodic process defined above, it will be sufficient to form a 
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model random sample ft lt ft ^ . ., fth on the basis of the distribution 
function F(u\ and then take #a+i = #i, fth + 2 = ^ 2 j etc. Still 
simpler, a model series with h = 2 has been obtained by letting- a 
coin-throw decide whether ft 1 should equal 1 or — 1, and by taking 
7 t 2i = — ft u ft 2 i+i = ftf The resulting series reads 

(107) 1-1 1-1 1-1 1-1 1-11 

This series will be referred to as the ^-series. 

Considering the difference equation (14) corresponding to the 
relation of singularity (106) of the periodic process constructed, 
this equation has for general solution a sum of harmonics, viz. 
the expression (15), a circumstance in full agreement with theorem 2. 
This aspect corresponds to interpreting a sample series 7 t t by means 
of Fourier analysis as a composed harmonic (cf. section 5). 

Having so far shown that the class of singular stationary proces- 
ses introduced in section 14 is not vacuous, the next question is 
if there exist singular processes which satisfy relations (77) of a 
more general type than (106). As a matter of fact, any particular 
relation generalizing (106) will impose certain conditions upon the 
distribution functions {.F} of the singular process. Leaving open 
the question of which special distribution functions may present 
themselves in case of a special singularity of type (77), we advance 
that the normal process (see section 16) will be found to admit 
any relation (83) as a singular case. After this reference to a process 
of superposed harmonics, only one of the schemes surveyed in 
Chapter I, viz. the scheme of hidden periodicities, remains to be 
interpreted as a stationary process. 

Q. The process of hidden periodicities. Let {£ (1) (©}, . {fW (0} 
represent independent stationary processes. According to section 

13, the sum {£(£)} = |£ (1) (£)} + f- {£ (fc) (0} will constitute a stationary 

process. If at least one of the processes is a periodic pro- 

cess, or a process of superposed harmonics, {£(£)} will be called a 
process of hidden periodicities . In particular, letting Jc = 2, and 
taking for {£ (1) ®} a process of superposed harmonics, and for 
{? (2) (0} a purely random process, we get the simple scheme of 
hidden periodicities dealt with in section 8. 

A model series for the process of hidden periodicities may be 
obtained from independently constructed model series for the purely 
random process and the periodic process. Taking one model series 
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of each type, the series obtained by summing corresponding ele- 
ments will form a model series for the process of hidden periodi- 
cities. The table below gives the first elements in two model 
series for the processes { Jf2 <1} } and {42 <2) } defined by 

= *(*)- f a W(© f 

the a - and ^-processes being defined in corresponding articles of 
the present section. The two 42-series consist of 1000 elements 
each. The construction may be followed in detail by means of 
table 1 and the series (107) (cf. formula (39)). 


Table 4. (1) Model series (42* 1J ); first 100 elements . 


1 

— 2 

1 

-1 

1 

-1 1 

-1 

1 

—2 

0 -1 ( 

) -1 

0 

-1 

1 - 

-2 

1 

-2 

1 

— 2 

1 

-l 

1 

-1 1 

0 

1 

-1 

1 -1 

L -1 

1 

-1 

0 - 

-1 

1 

-1 

1 

— 1 

1 

-l 

1 

-1 0 


1 

-1 

2 -1 

L -1 

1 

-1 

1 - 

-1 

2 

0 

1 

-1 

0 

-l 

1 

-1 1 

0 

l 

~1 

1 -1 

L -1 

2 

-1 

I - 

-I 

1 

-1 

1 

-1 

1 

-1 

1 

0 1 

—2 

1 

-2 

1 -1 

1 -1 

0 

-1 

2 - 

-1 

1 

0 






(2) 

Model series 

(42P 5 ); first 100 elements. 




2 

0 

1 

-1 

2 
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the case of hidden periodicities, we note the simple relations 

Qcit + w * h) = 7tt + £(t 4- n • h)\ n h)]=*7tt + E[g). 

Here {tt( 0} and {£(£)} represent independent stationary processes 
with sum |42(£)}. The process {n (ft) is assumed to be periodic with 
period A, while n is an arbitrary integer, and C stands for the 
condition (G) = [rcit) = 7tt). 

The scheme of hidden periodicities (39) and its many important 
applications are well-known from the text-books of time-series anal- 
ysis. We have now interpreted this scheme as a special case of the 
stationary process. In what follows the scheme (39) with its rigid 
periodicities will be touched upon only incidentally. Our main theme 
is the equally important but largely unexplored processes of linear 
regression. 
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16. On the normal stationary process. 


With reference to A. Kolmogoboff, A. Khintchine in a paper 
(1934) returned to in the next section touches upon a continuous 
stationary process constructed by means of normal distribution 
functions. JSLututis mutandis , the same construction device will 
supply a discrete stationary process. In the sequel, this process 
will be termed the normal process. As a basis for illustrating the 
general stationary process, the normal process will prove very 
useful. In fact, in spite of the formal developments connected 
with the general normal process being of a simple structure, the 
normal process will, by proper specializations, be able to illustrate 
an y *7Pe of stationary process mentioned in the previous section, 
and besides as already advanced — the singular processes 
satisfying relations of type (83). 

Before going into details concerning the normal stationary process, 
it will be convenient to introduce the concept of a general normal 
distribution in an enumerable set of variables. 

Let an infinite quadratic form with real coefficients be given by 


( 108 ) 


Q(X u X it 


■•)=2 2p p(l -X p X q 

P=1 9=1 


In the following analysis, the variables X* may take on any real 
values. Under such circumstances, the form Q will not always be 
convergent. If divergent, the form must be interpreted symboli- 
cally, viz. as the comprehension of all finite forms of type 

Qn(X lt . . ., X„) = Q (X 1 , X 2 , . . ., X n , 0, 0, . . .). 

Thus prepared, let (A) fi P q = /j.q p , and (B) any determinant ./ (n) 
of type 


^115 ^ 12 , . . fXin 


J hi) = 


1^225 • • •, ^2n 


l^nlf • • • fJ'nn 


be non-negative. Then, taking for(m) = (m 1 , 
real sequence, a function f(X u X s , . . .) i n ’ 
variables (X) = (X 1 , X 2 , . .) will be defined by 


m ii • • •) an arbitrary 
an enumerable set of 
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(109) 


S m p -X p -±-Q (X U X»...) 

f(X» x„ ...)«= « P=1 


As often as Q is divergent, this function must be interpreted 
symbolically in the same way as Q. Now, the function fn(X t , . ., X n ) 
defined by 

n i 

*• S ™ P 'Xp~\‘Qn (^..,X n ) 

/n (X*, • Xn) =/(X a , . ., X n , 0, 0, . . .) = t *> =1 


is the characteristic function of a certain normal, w-dimensional 
aleatory variable, say [£ (1) , . ., § (n) ], (see e. g. H. Cramer (1937) p. 109). 
We have E[Q i] ] = and -E[(£ w — — ra*)] = The form 

Q n will thus be of the type (76). 

In case z/ in) =t= 0, the variable [£ (1) , . f (n) ] will possess an ab- 
solutely continuous distribution, and the density function, say 
<p n (u ly . ., Un ), will be given by 


(pn (u u . U n ) = 


(2 nf n V J(n) 


e 




Here Qn (u ly . «„) is defined by 

Qn («!, . Un) = 2 2 ~ ■ (u p — nip) (u,, — m q ), 

g=i z7 (n) 

where A vq {n) denotes the cofactor of Ain) belonging to the element 
fx pq . Considering, on the other hand, the case d(n) = 0, let it be 
assumed that 4 in) is of rank h< n. Writing the relations of 
singularity on the form (69) after a suitable arrangement of the 
variables it follows from the previous analysis that the variable 
[£ <l) , • •> C (n) J defined by (72) will be composed of h , and only fe, non- 
constant variables On the other hand, for i>h the variables 
reduce to the constants m * given by (75). Expressing the 
characteristic function of [£ u> , . ., f (w) ] in the variables Zi, this will 
reduce to 

n i 

23 K’ z v~o — %h) 

(110) fl(Z u ..,Z n ) = e p- 1 

Next, let us consider the infinite sequence f = [£ (1) , £ (a) , By 

construction, every finite sub-group [£ il \ . ., will possess a well- 
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defined probability distribution, and, further, satisfy all consistency 
relations of type (53 — 54). Thus £ will constitute a random variable 
in an infinite number of dimensions. A variable £ of this type will 
be termed normal . The function f will be called the characteristic 
function of £. This term is justified by the evident fact that the 
distribution of £ is uniquely determined by f and vice versa. 


A necessary and sufficient condition that a normal distribution 
as defined by (109) may be taken to define a variable £(£, t — 1, . . . .) 
connected with a stationary process {£(£)} is that (a) m p reduces to 
a constant, say w, and (b) ti Pf<l is a function of p — g, say \ip-q. 
In fact, taking X*, X*— i, X*— 2, . . . for variables in the characteristic 
function (109), the coefficients of Xt—j> and of ■2 ^— p * ^ will be 
independent of t when, and only when, the conditions (a) and (b) 
are satisfied simultaneously. 

According to the condition (A) attached to (108), we have 
Pq-p “ ^Ip— v\- Further, disregarding the case of empty 
determinants the second condition, J{n) ^ 0, is seen to imply 

(i Q > 0. Thus a set of real numbers r* = & will be well-defined by 

putting 

n = fik/po- 

In terms of these r*, the conditions 0 will reduce to the 

inequalities (81). 

According to the above, the general formula for the characteristic 
function of the variable £(£, t — 1, . . .) = £[fl connected with a normal 
stationary process {£(0} is given by 

2 r\ p _ q y X t _ p X t _ q 

(111) f(X t ,Xt- h ..,) = e P=<> *=°«=° 


where (A) m, a > 0, and r* are real, and (B) the coefficients r* satisfy 
the inequalities (81), viz. 4(r,ri)^Q. 

It is seen that the normal stationary process defined by (111) 
has for principal distribution function the normal distribution 
function 


0>(w) = 


® V^2 7t — ~ 


J C7 U U/j 


and that the coefficients r* appearing in (111) are nothing else than 
the autocorrelation coefficients of the process. 
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Thanks to theorem 2, the existence of a singular normal process 
satisfying (83) will be proved if the autocorrelation coefficients of 
any composed harmonic satisfying (30) are such that 

( 112 ) 2 2 r\p—q \ • Xt—p Xt — q ^ 0 
p — 0 q=0 

for any w, and for any real sequence (X) = (X*, X<+i, X*— 2 , . . .). 
For then, this form, and the quantity m obtained from (30), satisfy 
the conditions for introduction in the general foimula (111) of the 
characteristic function of the variables — 1, . .) defining the 
normal process, while the dispersion a > 0 may by chosen freely. 

The remaining proof involves no difficulty. Denoting by x(t) an 
arbitrary composed harmonic satisfying (30), let a set of integers 
be given by 0 <Zi x <i 2 <••• < i n - Employing an argument used by 
A. Khintchine (1934), the obvious relations 

1 N 

(113) 0 ^ 2 [X*— i x [x(t — i 1 -f s) — m) + Xj— *,(#(£ — i s -I- s) — wi)+*“ + 

iV 8=1 

+ X t -f n {x (t in 4- s) — m)]* = 

= 22 X— / • X t ~i • ~ 2 (x ( t—i p + s) — m) [x(t—i q +s) — m) 

p= l q =1 p v IS 8=1 

will define a non-negative definite quadratic form in the variables 
X Wp . After this observation, let N-+cc. Then, disregarding a 
constant factor, the coefficient of X<— * Ih, in the quadratic form 
will, by definition, tend to the autocorrelation coefficient r\i p —i q \ which 
belongs to the composed harmonic considered (cf. (12)). Since the 
form will remain ^ 0 also in the limit, and since the sequence 
i p is arbitrary, the limit inequality implies (112). 

Some circumstances connected with the singular processes merit 
particular attention. In the first place, assuming L [f ( t ) — m] = 0 
as given by (77) to be the relation of singularity of the lowest 
possible order, the distribution of §(£ — 1, . t — h ) will be non- 
singular. In other words, if in a sample series 

«b 4" ^0 "t" ^ 1) • * *J ^0 ' ^ — 1) * • *J 

the values 2 are known, neither nor g^+i, .. 

will be uniquely determined. On the other hand, if the sample 
series section &o — &»••»&<>— 1 is known, then and, recurrently 
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g f +lj will, with probability one, be uniquely determined by 

means of — m] = 0. Thus, &+* may be regarded as a solution 

of the difference equation (32) subject to the initial conditions 
, . ., gtf- a- According to the theory of difference equations (cf . 
section 6), the periods pk of the individual harmonics in £* 0 +< will 
be determined by the coefficients a% in (77), and therefore be the 
same in all sample series connected with the process considered. 
On the other hand, the amplitudes Ck and the phases g Ok will be 
determined by the initial values. As mentioned above, these contain 
a random element, so the amplitudes and the phases of the indi- 
vidual harmonics constituting will vary from one sample series 
(. . £f 0 +jfe— i, . . .) to another. Of course, any expectation connected 

with the varying phases and amplitudes, e. g. 2?[C?], forms a charac- 
teristic of the process, and may, considering e. g. a normal process 
defined by (111), be expressed in terms of m , a and rk (cf. p. 73). 

The above remarks apply, of course, to every singular process, 
and thus both to the periodic processes and to the singular normal 
processes. If the distribution of £(t — 1, . ., t — h) is absolutely 
continuous, which is always the case in the normal process, a more 
precise conclusion may be arrived at. In fact, let it in such case 
be assumed that not all of the individual harmonics in (17) would 
be present in %to+t, i. e. that at least one harmonic would have a 
vanishing amplitude Ck- Then I~t Q +t must satisfy a difference equation 
of order h — 1 having a general solution satisfying also (32). Since 
there are only h such equations at most, the sample series sections 
(& 0 _ i, . & 0 — h) having the property assumed will form a set of 
Borel measure zero in the space of §(£ — 1, . t — h). Keeping in 
mind the absolute continuity assumed, it follows that with proba- 
bility one all individual harmonics really will present themselves 
when writing a sample series %t 0 +t on the form (17). 

The investigations of E. Slutsky (see (1927) and, e. g., (1937)) 
and V. Romanovsky ((1932), (1933)) concerning the » sinusoidal limit 
law» fall under the theory of the stationary process, and present 
some parallelism with the previous analysis of the concept of 
singular process as introduced in section 14. Translating into the 
terminology of the present study, these authors investigate certain 
sequences of stationary processes, say {/^(ft), {/? (S) (©}, ••• Denoting 
by rj*) the autocorrelation coefficients of the process {^(6}, and 
by a;® a function (17) satisfying a linear relation L[x{t) — m] = 0, 
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the conditions imposed on the sequence {/?<*>(£)} imply that Z[?-( p )]-*0 
as p— * co (see V. Eomanovsky (1932), Theorem D). Representing 
by $ pl > • •. $?’) a section in a sample series of {/S ,p > (£)}, and holding 
n fixed, the sinusoidal limit theorem asserts that, for sufficiently large 
values of p, the section considered will, with a probability as close 
to one as desired, approximate a function of type x(t) with any 
prescribed accuracy. 

A few reflections on the previous analysis will verify this theorem. 
Writing m p = E[^(t)], let an auxiliary set of processes (| a, (*)}, 

{| <2) (i}, ... be defined by § p] (t)=L[^ ,] (t)—m p ]=J 2s ^ b h„ • 

• r fi' v ' : ( t ) — m p ]. It follows from the conditions imposed on i -j p > that 
the variables £ (p > it + k) will tend in probability to zero asp-»oo. 
It remains to prove that the composite variable it), £ {p) it -f 1), 

| (p) (^ + «)] will, for any fixed n, tend in probability to (0,0, ..,0) 
as p— *oo. This, however, follows at once from the Boole inequal- 
ity (64). 

By examples of the type {/?<*>(*)} = I a pi • {a(t - i~)}, E. Slutsky 

2=1 

and Y. Eomanovsky show that their theorems are not empty. 
While the added processes a used by Slutsky are all of the purely 
random type, Eomanovsky gives other examples as well. The 
recent paper of E. Slutsky (1937) already referred to contains 
references to certain related investigations, and illustrates in full 
detail the behaviour of model series of the processes {/J lp) (£)} con- 
sidered. In full agreement with the sinusoidal limit theorem, 
sections of moderate length in a model series approximate composed 
harmonics (17) of the proper type. The periods of the individual 
harmonics are the same in different sections, while the amplitudes 
and phases vary. These features are seen to be analogous to the 
properties of the singular processes proved above in connexion with 
the singular normal process. This parallelism is not accidental. 
In fact, an analysis of the sequences studied by E. Slutsky and 
V. Eomanovsky will show that these are convergent, and that the 
limit processes are singular. — In section 25 is given, i. a., a general 
device for the construction of sequences ruled by the sinus- 

oidal limit theorem. 


5 - 535097 . H. Wold. 
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17. The autocorrelation coefficients as FOURIER constants. 

In a paper already referred to, A. Khintchine (1934) studies 
what in a continuous statistical process with finite dispersion cor- 
responds to the autocorrelation coefficients in a discrete process, 
viz. the function defined for any real u by 

BW - • (£M + tO 

where {£(£)} represents the continuous stationary process considered. 
Terming B(u) the correlation function of the process, Khintchine 
gives, i. a ., a necessary and sufficient condition that a function 
JB(w) be the correlation function of a continuous stationary process. 
Slightly modifying the result, the condition is that there exists a 
distribution function, say V (x\ such that F(0) = 0, and 

(114) R(u) = / cos ux- dV {pc\ 

o 

The inversion formula (see e. g. H. Cramer (1937), Theorem 9) 

rr / -v 2 ? sin ux 

V{x) — — j R (u) d u 

7t o U 

shows that the function V(x) is uniquely determined by R (u). 

We shall first give a corresponding theorem on the discrete sta- 
tionary process. It should be observed that the same theorem holds 
for the generalized process (cf. section 11). 

Theorem 5 . Let r* {h = 0, ±1, ± 2, . . .) be an arbitrary sequence 
of constants . A necessary and sufficient condition that there exists a 
discrete stationary process with the rk’s for autocorrelation coefficients 
is that the rjt's are the Fourier coefficients of a non-decreasing func- 
tion , say W (x), such that W(0) = 0 ; W (tz) = n, 

(115) n = — / cos Jcx • d W (x). 

7t o 

Before passing to the proof we give the following inversion for- 
mula involving a converging sum (see e. g. F. Hausdorff (1923) 
p. 245) 
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(116) W(x) = x 4- 2 • 2 p sin hx . 

Jfc==l a7 

With suitable agreements as to points of discontinuity, the func- 
tion W(x) thus will be uniquely determined by the autocorrelation 
coefficients. In the sequel, W ( x ) will be called the generating func- 
tion of the autocorrelation coefficients r*. 

The proof will use some facts concerning definite quadratic forms, 
facts parallel to the properties of definite functions used by A. 
Khintchine in the continuous case. In other respects the proofs 
are coincident. 

For a verification of the necessity of the condition given, let 
{£(£)} be an arbitrary discrete stationary process with finite disper- 
sion, say <7. Put E [£ (£)] = m, let rt represent the autocorrelation 
coefficients of the process {£(£)}, and consider the quadratic form 


(117) 2 2 r\p-q \ • X p X q , n = 0,1,2,... 

p=0 9=0 

For any real sequence Xt lf X . . ., Xt n we have (cf. (113)) 



2 SjB[X 2n 


a <7=1 


p = i 


Xt p X tq . 


This relation implies that the forms (117) are non-negative definite. 
Next, according to a theorem of Gr. Herglotz* (1911), this state- 
ment is equivalent to saying that the following system of trigono- 
metrical moments, 

12 rt 12 rt 

(118) - — / cos kx ■ dW(x) — rt; — / sin kx • dW{x) — 0, 

& 7t 0 & “Tt o 


has a non-decreasing solution W{x) with IF(0) = 0. 

The inversion of (118) gives exactly (116). The required relation 
(115) follows directly from (116), and since (116) further gives 


* Prof. T. Carleman has kindly informed me that the formulae (118) may be ob- 
tained directly from the Hilbert representation of a definite quadratic fonn of gen- 
eral type. 
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°° Vk 

W (2 it — x) = 2it — x-'Z-'Z =- sin Jc x, 

k—1 fC 

we obtain W(x) = 2 tc — TT(2 tc — x), and, finally, W(n) = 7t. 

On the other hand, if r* is given by (115), the Herglotz theorem 
asserts that the forms (117) are non-negative. Thus, the relation 
(112) holds, and there exists a normal process with the given re- 
values for autocorrelation coefficients. 

The theorem proved above permits of some general conclusions 
concerning the autocorrelation coefficients of a discrete stationary 
process. Considering the coefficients r* for large A-values, their be- 
haviour will depend on the continuity structure of the gener- 
ating function W(x). In order to study the situation in some 
detail, let 

W(x) = a (1) • W a) (x) 4- a® • W {2) (x) + a (3) • W iZ \x) 

stand for the well-known (cf. e. g. H. Cramer (1928), p. 59) repre- 
sentation of W{x) as the sum of three uniquely determined, non- 
decreasing functions with W /(1) (0) = 1K (2) (0) = II r(3) '0) = 0, H /(1) (7r) = 
= W {2) ( 7t ) = W ls) ( tc ) = 7T, « (1) -I- a {2) 4- a (3) = 1, ^ 0, and 

1) a n) -W m (x) = 

0 ay 

2) a (2) • W (t) ix\ the saltus function, is equal to the sum of saltuses 
of W (x) at all the points of discontinuity which are less than or 
equal to x ; writing X v for the saltus points of W (x), and 
cl ■ re! 2 for the corresponding saltuses, then 

( 2 ) 

— • W w {x)= 2 cl/2, 

3) a [z) • (x), the singular function, is a continuous function 

which has almost everywhere a derivative equal to zero. 

Thus prepared, we put 

*i #) = -- f cos Jc x • d W^ i] (x), i — 1 , 2, 3, 

K TC 0 

and obtain 

(119) r k = a {l) -?ir + a l2) -rT + a (a) ■ r?\ 

The components rf thus uniquely determined by the n-sequence 
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via its generating function (cf. (116)) are of entirely different char- 
acter. 

As to r'k we have (see e. g. H. C. Carslaw (1930), p. 271) 

(i) <15 1 ft clW (x) 7 j A 7 

a U) • r k = — J — r — cos Jc x a x — > 0 as & — > co. 
n o ax 


The component r™ is given by 

(120) a (i) ■ jf 5 = J'Scr cos A, i. 

* 2 — 1 


It is seen that rjb 2) is an almost periodic function. Again refer- 
ring to (19), we conclude that arbitrarily large Zb-values exist for 
which approximates 


»i s )=-r'w=i'ScL 

jc 2 or * 


The singular component rT permits of no unconditioned state- 
ment as to its behaviour for large ^-values. 

In order to arrive at a criterion of the structure of W (a?), let it 

CO 

be assumed that 23|r*| is convergent. It follows from (116) that 

A :— 1 

in such a case W{x) for all x lias a derivative W’ (a?), that this will 
be obtained by summing the derivatives of the terms in the right 
member of (116), and that W f (x) will be bounded. We thus ob- 
tain the following corollary to theorem 5. 

Corollary . Let {§(0} be a stationary process ivith autocorrelation 
coefficients n- such that S |r*| is convergent . Then W(x) will be absolut- 
ely continuous , 

W(x) = W a) (x). 


The derivative W* (x) is bounded in modulus , and given by 


(121) W’{x) = 23 n- cos hx , 0 < a? < n. 

* CO 

As a first application of the above analysis, we shall touch upon 
some problems concerning the relation between continuous and 
discrete stationary processes. The question to be put corresponds 
to a problem dealt with by G. Elfving (1937) in a study on Mar- 
koff chains. 
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A continuous stationary process, say gives a hypothetical 

scheme for the probability relations in any time points by means 
of a set {F^ c) } of distribution functions F(t u . t n ] u l9 . ., Un) satis- 
fying (53) — (55) and referring to quite arbitrary time points t u . ., t n - 
Among these distribution functions, let those referring to integral 
time points be represented by {F^}. It is plain that the set {F (d) } 
thus obtained will define a discrete stationary process, say {§ td) (0}. 
The situation may be described by saying that the hypothesis 
{? c, (0} is consistent with the hypothesis 

Marking the symbols referring to consistent processes {£ (c) } and 
{Q d) } by (c) and (rf) respectively, let {^ c) } have a finite dispersion. 
Then, evidently, we have for any integral ft (cf. (114)) 

r id) = jRWOfe). 

Further, for any x in the interval (0, tc\ we have 
(122) W (x) = 7L • 2 [ V(n • 2 7t + x) — V (n • 2 tc — x)]. 

n—0 

In fact, inserting the right member of (122) in (115), and paying 
regard to (114), we obtain by elementary transformations for inte- 
gral ft-values 

= / cos hx - Hi d [ V(n • 2 n + x) — V(n • 2 7i — x)] — R {c) (ft). 

0 71=0 

We see that (122) illustrates the ambiguity about periods in the dis- 
crete case (cf. p. 15). A harmonic with period p< 2 will in the 
continuous case correspond to an increase in FU) for X>n. In the 
discrete case, such a harmonic being read off only for integral time 
points, its values vill coincide with those of a certain harmonic 
with period p> 2 and frequency X<tc. 

On the other hand, let a discrete process be defined by 

a set {F^} of distribution functions referring to integral time 
points. Our question is whether there exists an » interpolating » con- 
tinuous process {£ (c) W} defined by a set {F (c) } where the distribu- 
tion functions referring to integral time points are identical with the 
given set {F (f/) }. Studying in the first place the autocorrelation 
1 n 

coefficients r[ d) = — / cos hx dW (x\ we seek a continuous stationary 

^ 71 o 

process {^ c) } with correlation function 7i (c) («) such that, for 
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integral i-values, B^(i) ==r^. Taking F(a?) = ^ W(x) for 0 
and FGr)=l for x^7t, such a function is evidently yielded by 

(123) JR W (m) = / cos ux- dV(x). 

o 

It should be observed that (123) remains unchanged for integral 
^-values when substituting, for instance, V(x-n- 2n) for V(x), 
letting n denote a positive integer. Thus, the interpolating func- 
tion B^(u) will by no means be uniquely determined by the auto- 
correlation coefficients r jW given. This indeterminateness is, of 
course, analogous to the circumstance mentioned in section 4 that 
there exists an infinite number of simple harmonics all of which 
pass through all of the values x{t n ) taken on in equidistant points 
tn by a simple harmonic x(t). 

In case the process considered is normal, the Khintchine- 

Kolmogoroff device for constructing a normal continuous process 
may be applied on the basis of an interpolating R^ c \u) as given, 
for instance, by (123). Furnishing the normal continuous process 
with the same mean and the same dispersion as the resulting 
process will possess the property desired, i. e. give rise to the prim- 
ary discrete process {£ (d) } when considering the probability rela- 
tions in integral time points. 

Illustration. Taking V (x) = 1 — e~ x , formula (114) gives 

R^ c \u) = 1 / (1 + tt 2 ). 

The sum (122) is readily computed, and gives 

[ e~ x — g*— 

1 ] : 0<x<7t. 

Insertion of this expression in (115) gives without difficulty 
.1 2k sin rck 

( ' 125 ' 1 r * — r+ k 2 + (e* — e~”) (1 + fc s ) 

In full agreement with the general results, we find that for integral /c-values — 

=JB (c) (W. 

On the other hand, in case a discrete process is given, and has W ( x ) as defined 
by (124) for generating function, one of the interpolating correlation functions will 
be obtained by taking i£ (c) (w) == r (d >, where rf for — «> <&< ® is given by (125). 
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The corresponding function V(x) is, of course, equal to — W(x) in (0, 7t) and equal 

71 

to 1 ill (Tty 00 ), 

The second application of formula (115) is concerned with the 
periodogram method for graduating a sample series section connected 
with a stationary process. Only discrete processes will be considered; 
the arguments also hold, however, in the continuous case. 

First, an investigation will be made as to whether a Schuster 
periodogram analysis of a sample series section, say i, . £<— n , 

may be expected to be effective, in particular for large ^-values. 
It will turn out that if the generating function of the process has 
a non-vanishing saltus component, the periodogram analysis will 
give positive results, viz. in the sense that the expectance of certain 
well-defined periodogram ordinates will be positive. On the other 
hand, in case the corollary of theorem 5 applies, a periodogram 
analysis will prove resultless. 

Let (£*— i, ^_ 2 , . . .) stand for a sample series belonging to a 
discrete stationary process {£(£)} with dispersion o, mean m, and 
generating function W (x). 

Applying the classical Schuster periodogram analysis to the 
sample series section (&_ i, £/- 2 , . ., §*-«), let the resulting periodo- 
gram functions be denoted by (see (26) and (27)) 


2 n 2 n 

A ();, A) = - 2 (gt 0 + P — m) cos Ijp ; B (•«, A) = - 2 m) sin A a ; 

Wp= 1 n q= l 

(126) C* (n, A) = A* (n, A) + B i (n, A), 

where t 0 = t — n — 1. Studying in the first place the expression 

E = lim E[C 2 (n, A)], 

n — *oo 

elementary transformations yield 

(127) E[C 2 (n,X)] = 

4 n n 

= S S (cos Ip • cos ^,#4- sin^sinAg)_E[(£(£ 0 -f p)“ m)[^(t 0 + q)— m)]= 
n P =1 q = 1 

A ij2 n n a _S n — 1 I 7. I 

= — g- S 21 r\ p— Q | cos X (jp — q) = 2 — n*cos^& = 

n p=i 9 =i n k=—n+ 1 n 

4 a 2 


= — [1+2*2 ( 1 ■— - ) rjt cos X Jc \ . 
n L \ nJ J 
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Iii case the process {£($} is non-autocorrelated, we have rjc = 0 for k > 0 . The 
above formula then reduces to the Schuster formula (87). 

In a study on sampling* problems in intercorrelated series, E. Slutsky (1934) 
investigates, i. a., expectations of type E[C 2 (n, X)] for Z-values equalling multiples 
of 2rc/n, i. e. expectations connected with the Fourier coefficients of a sample se- 
ries section n ) (cf. (25)). Under certain restrictive conditions concerning 

the process considered, Slutsky gives the relations (127). 

J. Bartels (1935) seems to be the first to have deduced the relations (127) without 
setting restrictive conditions on the process analysed . 6 

In order to avoid discussions which might obscure the point of 
the analysis, we shall now introduce a restriction concerning the 
generating function of the autocorrelation coefficients, viz. that 
a iS) = 0, and that S |rjt°| is convergent. According to (121), the 
latter assumption implies that d W {1) (A)/ d X is finite for all X. 

It is seen that the limit expectation E may be split up into 
portions corresponding to (119), say a {x) • E {1) , a i2) • E (2) and a (3) • E {8) . 
By the simplifying hypotheses made, we have a (3) = 0. 

Considering the relations (127), we observe that an elementary 
transformation gives 

(128) 1 + 2 2 (l— -Wjt cos Xk = - • 2 2 rt cos XTc , 

k=i \ nf u 


and that the right member is a Cesaro mean, viz. the arithmetical 

8 

mean of the first n partial sums 2 cos X k of the series appearing 

— 8 

in (121). After substitution of r*” for r*, the expression (128) thus 
will tend to d W il) (A) /<Uas»-*». As by hypothesis this derivative 
is bounded in modulus, we conclude that for all A in (0, n) 


(129) 


K <» = 


4 a 2 


n a 




dX 


- + o (1 / Ml -*■ 0 as n -*■ oo . 


Paying regard to (120), a similar argument gives 


(130) 


E (2) = lim — 2 \l + 2V (l - -) cos A A] =0 for A 4= A„ 

7i — »oo L A:=l ' 

E i2) = cl • (r/« (2) for X = X v < n. 


Thus, while an ordinate G 2 (w, A v ) in the periodogram of a sample 
series may vary from one series to another, its expectation is by a 
simple limit relation connected with the saltus, f cl, in X = X v of 
the generating function W(X) of the autocorrelation coefficients. 
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It is seen from (129) — (130) that, under the assumptions made, it is 
only if a (2) > 0 that a periodogram analysis of a sample series will 
be fruitful. Assuming that there are s discontinuities in TF’ (2> (A), 
the analysis will result in a composed harmonic, 

* * 

x n (t — Jc) = 2 A (n, Xv) cos (w+ 1 — Jc) X v + 2 B (w, X v ) sin (n + 1 — Jc) X v , 

V—l v =i 

approximating the section (£<— i, . n) analysed, and with coeffi- 
cients depending on the sample series considered and satisfying 

lim ^JE[A s (n, A.) + B i {n, A,.)] = cl. 

n— *oo (7 

A standard measure of the deviation %t—k — (£ — Jc) is given by 

+he expectation E defined by 

JE = . 2 [f (f - 1c) - Xn(t - £)] 8 1 • 

L n k = 1 J 

Disregarding terms of order 1 /n, the coefficients A and B in 
Xn d — Jc) will make E a minimum (cf. (43)), 

1 1 r n ^ -| 

■Baad - 2 C ‘ g ei> ‘ ffS = L 1 - ^ • o*. 

This relation shows clearly the scope of the periodogram analysis. 
In fact, in the special case W(x)^ W (2) (x ), the approximating func- 
tion x n (t — Ic) will, for sufficiently large n- values, yield a fit as close 
as desired. 

On the other hand, it follows from the previous analysis that in 
case 2 |n | is convergent, then W(x) = W il) {x), and dW(x)/ dx is 
bounded. In this case, a periodogram analysis of the section (^—i, 
&-2, ■ • •) will be resultless (cf. (129)). 7 Processes of this type 
have to be attacked by the use of entirely different methods. 
Now, a method at once suggesting itself is that of linear regres- 
sion analysis. For instance, approximating £(f) linearly by means 
of %(t 1) we obtain — t — 1) + 17®, denoting by 7 ](f) a 

residual variable with variance a 2 • (1 — r?). Otherwise expressed, 
yields an effective prognosis of with a squared dispersion 
equalling E [[£* r, ■ ^ — il 2 ] = a 2 * (1 — rj). This simple instance is 
sufficient for exemplifying that this method is of quite another 
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type than the periodogram analysis. The next stage of the present 
study will be to follow up this line of research; this is done in 
section 19. The coming section is reserved for a preparatory survey. 

For illustrations of periodogram analysis of model series, reference 
is given to section 25. For the present we only remark that the 
general formula (127) gives 

I n 4.-8 

(131) i- fmCHn, X)]dX = — • 

tc o n 

Comparing with (37), the expectation of the ordinates in a periodo- 
gram is seen to be a function of X with mean value equalling the 
constant (in respect of A) expectance in the case of a purely random 
process. 

It is rather interesting to compare the relations (131) and (130). 
It is seen that (131) also holds in the case of a process of hidden 
periodicities, and that in such a process the ordinates E[C 2 {n\ X v )] 
will not tend to zero as w co . We conclude that the rise to a 
maximum in E[C 2 {n\ X)] will be very rapid, and that the corre- 
sponding peaks in the periodogram will be very thin, with a 
breadth of an order of magnitude not surpassing 1 hi. 

On the other hand, if the corollary to theorem 5 applies, the 
expectation E[GHn\ X)] evidently tends to zero uniformly in X as 
n — > oo j 

4 W f (1) 

E[ X)]= - ■ g 2 + o (1/w). 

n 

In full agreement with (121) we conclude from (131) that in the 

1 71 

present case ^ J W' (A) d 1 = 1 . 


18. On linear approximation in a space of random variables. 

The next section presenting a generalization of the regression 
analysis of ordinary random variables to the general discrete sta- 
tionary process, the present section is reserved for an interpretation 
of the ordinary regression analysis as a linear approximation in a 
metrical space. The interpretation applies, of course, to statistical 
as well as to aleatory random variables. 

When dealing in the following with a set of random variables, 
say f, we shall tacitly assume that any finite sub-group, say [| (1) , . 
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£M], forms a well-defined multi-dimensional variable, say with distribu- 
tion function F(u L) . ., u n \ and that the functions F are consistent 
(cf. p. 32 f.). As observed by M. Frechet ((1937), p. 205 f.), a set of 
one-dimensional random variables with finite dispersion can be made 
metrical by defining the distance between two variables in the set, 
say and £ w , as the dispersion of the difference variable £(*)— g( fc ) s 

(132) |g(*)_g<*)|= j)(g(*)_ gW). 

This fact depends on the triangular inequality 

(133) D (£ (1) - £ (2) ) 4- D(£ (1) - £ (3) ) > D(£ (2) - £ (3) ), 

which by elementary transformations reduces to the inequality of 
Schwarz. 

Adopting the distance definition (132), next let a stand for a real 
number, and let £ and rj be two random variables. Then we have 

(134) | £ - a • r\ | 2 = Z> 2 (£) - 2 a- r (£, rj) • D(£) ■ B (i rj) + a 2 D*(r)). 

This squared distance will reach a minimum equalling 

(135) [1 — »- 2 (£, rj)] • D 2 (£) 

when a equals the regression coefficient of £ on rj , i. e. when 

(136) a = r(£, rj)-B(g)ID{rj). 

This regression coefficient is linear in respect to £. In fact, 


(137) 


r® + £, ,).B(| + p _ ^IK-EHH-i-E[g(l-EWl _ 

JJvrj) 

EU~ E [£]) fa - g [i?])] + Jg[(g — Jg [£]) (17 — JE [iy])] _ 

DO?) 


= r(£, ,)■!)© + r£, 17) ■ D (£). 


Using these properties of regression coefficients, the multiple re- 
gression theory founded and developed by the English statistical 
school (see G. U. Yule and M. G. Kendall (1937), p. 511, for re- 
ferences) can be interpreted as a particular branch of the theory of ap- 
proximations in general linear spaces. In the general terminology, 
uncorrelated variables £^ and £M should be called orthogonal ele- 
ments in the space to which they belong. For later application we 
shall record some general approximation formulae in terms of de- 
terminants. The verification being in detail parallel to the Gram- 
Schmidt orthogonalization of vectors in an infinite number of dimen- 
sions, reference is again made to G. Kowalewski ((1909), § 175). 8 
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Let an w-dimensional variable [£ (1) , . formed by variables 
with finite dispersion be linearly non-singular (see p. 41 f.). Consider- 
ing an arbitrary variable £ (0) with finite dispersion, and writing 
w* = 2?[£^]; pik =jE[(£M — mi) ($ k) — m*)], there exists a well-defined 
sequence of coefficients ai n minimizing the dispersion of the variable 
defined for arbitrary ciin s by 

(138) § (0) — m 0 — fli n • (5 (1) — wii) a nn • (£ (n) — m n ). 

Terming residual , and denoting by the variable (138) formed 
by the minimizing coefficients we have 2J[ijM] = 0, and 


(139) 


Mil, 

•,Mln, § (l) — 


[A\\, • 

•j Mio 

Mnl> 

■ •, /inn, — 


M^lj * 

• 5 [Ann ) [An 0 

Moi, 

. ^on, £ (0) — w? 0 

: Z) 2 (r)W) = 

Moii • 

• s MOO 


Mil, 

[Ain 

, x/ v — 

Mil, • 

•) [Ain 

[Anlj 

• [ inn 


Mnl, • 

• i [Ann 


A set of formulae equivalent to (139), and involving an auxiliary 
set of one-dimensional aleatory variables is given by 


(140) = r o) -nt 0 -c r - C 2 • £ (2) c n - £« , 

(141) D 2 (ij (B) ) = D 2 (| (#> ) - c? - cl cl. 


For the auxiliary variables we have (cf. G. Kowalewski (1909) 
p. 426 and T. Lindblad (1937)), 


(142) 


I M ii. ~ m i 


Hi) . 


g (1) — *», . I Mil' I' 


( 2 ) __ 




|/" ' 


Mill M 12 

l 1 21? ^2 2 



Mn> Mis. S'” - 

Msi. Mss. I' 2 ’ — m s 
Mai. Mas. £ <8> w s 


V 

Mu. Mis . 

Msn Mss 

Mil? Mi2?Mis 

M2H M22>M*S 

M-31, M32iM3S 
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These variables are standardised , i. e. 

(143) W>] = 0; D[?»]= 1. 

They are farther mutually non-correlated, 

(144) r (g® £ a0 ) = J5 [£® ■ £ ( * } ] = 0 f or i 4= Tc . 

and it is thanks to this relation that the coefficients Ct in (140) are 
independent of n, 

(145) * = r (£ <0) , C®) • D (£ (0) ) = E [(£ (0) - • £«] . 

On the other hand, in case [£ (1) , . £ {n> ] is singular, say of rank 
h [i. e., by relations of the type (69)], the coffficients a tn in the 
residual variable rj (n) will not be uniquely determined. In fact, the 
addition of an arbitrary linear combination of the vanishing sums 
(69) will not change the variable (138). In full agreement herewith, 
the expressions (139) become indeterminate when [£ (1) , . ., £ (n) ] is taken 
to be singular. 

Whether or not [£ (1) , . £ (w) ] be singular, the residual variable rj in) 
will be uncorrelated with every g (f) . In fact, formulae (134)— (136) 
imply that otherwise the variance E 2 (rj^) could be brought down 
by subtraction in of the non-vanishing variable 

r (l?(n) ’ ~ mi] - 

Considering the residuals rj w , . . rf n \ it follows from (141) 
that the variances _D 2 (ij (1) ), D 2 (?/ 2) ), . . ., Z) 2 (rj^) form a non-increasing 
sequence. It is of central importance that, when Z) 2 (rj®) > 0, we 
shall have J5 a (i/ ,+1 )) = if, and only if , one of the following two 

cases is present: 9 

(A) [f (1) , . ., i# +1 ^ is singular by means of a relation of type 

(146) p t • (§ C1) — m x ) H — mi) = — mt+i 

(B) No relation of type (146) exists, but is uncorrelated 

with rj®. 

It follows, i. a., that if [£ (1) . ., ] is singular of rank h ) the 

variables may be arranged in an order such that the coefficients 
a pq in the residuals t) a \ rf*\ . . ., ijW and only in these, are uniquely 
determined, while 
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(147) D (£ (0) ) ^ B fy< 15 ) ^ B (v i2) ) ^ ^ B = 

= B (^ +1 >) = • ■ = B tyW) ^ 0. 

In other words, the first h variables alone will be able to 
bring down the residual dispersion to its minimum value. 

The following determinant expression for a general partial correla- 
tion coefficient in the notation of G. U. Yule (see G. U. Yule 
and M. G. Kendall (1937), p. 269) will attach the above system 
of formulae to the familiar theory of multiple correlation. The 
formula is valid in case none of the variables [£ (0) , £ <2) , § <3) , . . £ (n) ] 
and [§ a) , § (2) , . . ., £ (n) ] is singular. 


♦" 01,23 • • 


£* 01 , £* 02 , • £* On 

£*21, £*22, ■ £*2ti 


£*»1, £*n 2t ■ •> t*nn 


/ 

A»n> Misi • 

• , £*ln 


£*00, £*02, . 

£*0 71 

/ 

£* 21 , £* 22 , . 

£* 2 n 

- 

£* 20 , £* 22 , • 

£* 2 n 

' 

£* 7 i 1 , £*n 2 , . 

• , £*71 n 


£*n 0 , £*n 2 , • 

£*nn 


A proof from n to n + 1 will verify 

B 2 (i? (n) ) = £* oo * (1 ~ roi) (1 — roa, i) * • ■ (1 — ron, 12 • . . (n-i)), 

a formula which shows, i. a., that also the partial correlation 
coefficients lie in the interval ( — 1, 1). 

For later application, we record that 

/, i Q \ Bo, 23 • • • n 

\ 14oj d\ n = *"oi, 23 • • • n * y: , 

u 1, 23 • • • n 

and analogous formulae hold for the remaining coefficients a,i n 
(cf. G. U. Yule and M. G. Kendall (1937), p. 266). Here Z),, 23.. » 
for i = 0 or 1 represents the dispersion of the residual variable 
obtained when approximating by the variables £ <2) , £ (8) , . ., £ (n) . 

For application in forecast problems, let us interpret (138) in 
terms of conditioned expectations. Writing (CO = (£ u) = £n • • •, 
= fj, an estimate of £c (0) , which is the best linear one in the 
sense of the principle of the least squares, is yielded by 
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(149) F c [£ <0) ] = m 0 + a x » (gi — m0 + ■ • ■ + a nn (£» — ™n ) . 
We have 


and 

E [g (0 c } - F c CT = E [rjW? = D 2 tyW), 

where the latter relation results from (57). 

If the variables g (1) , . . ., g (n) are mutually uncorrelated, the coeffi- 
cients dik are independent of k. Writing anc = ai , and taking 
(CO = (gW = g/ x . . g^ = g/j), we get in this case 

(150) F c [£ (0) ] = m 0 + a #1 (g* x — wfcj) + ■ • • + ^ (£* t — wi**). 

Further, if ijM is independent of the variables g (1) , . . g (n) 
we have 

(151) E c [r}W) = E[r}W] = 0, 
and 

(152) E C [g (0) ] = *n 0 + «ln (£t — Wj) + h dnn (gn “ m*) + 

+ Fc[^)] = Fc[g (0) ]. 


19. Linear autoregression analysis of the discrete stationary process. 

In this section, the linear regression analysis as surveyed in the 
previous section will be applied to the variables g (t) connected with 
a stationary process {g it)}. Approximating g (f) by means of 
g(f — 1), . g(£— -w), a well-defined procedure of consecutive approxima- 
tions will be given. After a passage to the limit, we shall arrive 
at a residual variable with properties corresponding to the case 
of a finite number of approximations. 

Let {gC©} be a stationary process with finite dispersion c, with 
mean m, and with principal correlation determinants z/ (r, n) given 
by (81). Approximating g (0 by means of g {t — 1, . ., t — n) = 
= [g it — 1), . g (t — n)] as described in the previous section, let the 
residual variable be given by 
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(153) rj ( t ; to) = £(ft — m — a{ 1, to) • [£ (t — 1) — m] — a (2, to) • 

■ [£ (t — 2) — m] — - * ~ a (to, to) ■[§(£ — to) — m]. 

In case ^(r,n — 1) + 0, formula (139) yields 

(154) D 2 (5 (6) ^ D 2 (ij (£; to)) = <r 2 • (r, n) I J (r, to — 1) ^ 0. 


Since D 2 (?j (£; to)), to — 1, 2, . . forms a non-increasing sequence (cf. 
(147)), we get 


> ^(r ! l)^(r, 2) > 
1 — (r, 1) — 


J (r, to) 

^ (r, to — 1) 


> ■••> 0 . 


According to the analysis in section 14, one of two cases is 
present. Either 4 (r, to) is above zero for all to, or 4 (r, to) is >0 
for to < h while d (r, to) = 0 for n ^h. The number h appearing in 
the latter case equals the rank of linear singularity of the process 
considered. Paying regard to these facts, and to the relation ( 1 55), 
it is evident that any stationary process belongs to one, and only 
to one, of the following classes: 

(I). The process is non-singular, and there exists a positive constant 
x 2 < 1 such that 


(156) D 2 [rj it; to)) / a 2 = (r, to) / ^ (r, « — 1) — » x 2 ^ 1 as to-**. 

(II) . The process is singular, say of rank h. 

(III) . The process presents no singularity of finite rank, but 

(157) D 2 [rj (t; to)) / ct 2 = J(r, n) / 4 (r, to — 1) 0 as to -+ * . 

In this case, the process will be termed singular of infinite rani. 10 

In the following analysis, we shall assume that the process 
considered belongs to the first class. 

Autoregression analysis of a non-singular process . According to 
(140), the variables (153) may be written on the form 

(158) rj (t; to) = £ (ft — m — e t g? (0 — c 2 ij* (ft c n In (ft , 

where the standardized variables £ © are given by (cf. (142)) 


(159) 


1 , r, tv-*, 

r, ,1 tv- s, S»-2)-m 


r—,. tv-». . 1 , g(*-t»)-t» 

<j ■ (r, «— 1) • ^ (r, n — 2) 


6 - 535607 . //. Wold. 
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and where the coefficients Ci are uniquely determined, and inde- 
pendent of n (cf. (145)), 

(160) [£(©,£(«] -a. 

According* to the general analysis in section 13, the variables 
§1 (f) will, for an arbitrarily fixed A, constitute a stationary process 
{£*(£)}. It is seen that the variables (1(f) that form this process 
will be of type (61). Similarly, for any n the variables r){t;ri) will 
constitute a stationary process {t](t;n} of the same type (61). It 
will next be shown that the sequence {r](j;;ri)} is convergent in 
probability as n — > » . 

According to (141) and (156) we have in the first place 

D 2 (rj (*£: rij) = a 2 — c\ cl — » x 2 a 2 as n — > » . 

We conclude that 2 cl is convergent, and that 

(161) <£ + <£+ ■•• = (l-x 2 )o 2 . 

Next, the variables £i (0, (£), . . . being uncorrelated (cf. (144)), we get 

D 2 (17 (£; n + p) — 77 (£; w)) = c? l+ i + * * * + cl+ p . 

Keeping in mind that 2 cl is convergent, it follows that this disper- 
sion tends to zero uniformly in p as 72 — ► <» . Hence, paying regard 
to a remark in section 13 (see p. 40 f.), we conclude that the 
sequence 77 (£; 1 ), 77 (£; 2), .... is convergent in probability. According 
to theorem 1 , this implies that the sequence 

(162) { V (t; 1 )}, { V (t; 2 )},.... 

will also converge in probability. 

The limit process of the sequence (162) will be denoted {17(0}. 
In analogy to the case of a finite number of approximations, the 
process {17(0} and the corresponding variables 77 ^,.., t n ) will be 
termed residual. According to a remark on p. 41, the mean and 
dispersion of the residual process will be given by the corresponding 
limit characteristics of the sequence (162). 

Observing that 77 (t; n) is non-correlated with the variables 
£(£ — 1 ), . . ., £(£— n\ and keeping in mind that the dispersion of 
7}{t;n) lies above a positive constant, it follows readily that the 
limit residual 77 (£) is non-correlated with J(£ — 7 ?) for any positive 
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integer n. Hence we conclude that 77 it) for any k ^ 0 is uncor- 
related with all of the variables £1 (f — k), (t — k), . . . defined by 
( 159 ) (cf. ( 137 )). Further we have 

E [(| (f) - w) • 17 (f)] - E [({(f) - m - a f 1 (0 c n £ (6) • 77 (0] = 


= lim E [tj it; n) • 17 it)] = E[i 7 (0 • 77 (f)] = D* (77 it)). 

n— ► 00 


Hence, paying regard to the relations (cf. p. 77 ) 
E [77 iff] — E[ri it; 77)] = 0 , 


we get 


?• (§ it), 77 ( 0 ) = B (73 it)) / D (g (fi) = x. 


Writing generally 

( 163 ) r (| it + n), 77 it)) / x = b n , 


we thus have b 0 = 1, and b n = 0 for n < 0. 

The above-mentioned properties of the residuals correspond di- 
rectly with the finite case dealt with in the previous section. 
There is also another important analogy. Considering in the finite 
case the residuals obtained when approximating £ (0) and by 
[5 <l) , . and [£ lp+1) , . £ (n) ] respectively, a short reflection shows 
that these residuals are non-correlated. In order to show, corre- 
spondingly, that the residuals 77(f) and rjit—p) are non-correlated, 
let it be observed that 

E[rit)-r]it—p)}=limE[r]it;k)-[^it—p) — m — Ci fiit-p) 

k — ► GO 

— Cjs-p+1 p+1 (f p) Ct£k it p)]] • 

Denoting by Qn - p the sum appearing in the second row, we have 
£%-p] = 0 , and D* (p*- P ) — 0 for any p > 0 as k->«>. Thus 

r (77 (f), 77 it —p)) = lim r (77 it; k), lit - p) — m — Ciglit - p) 

— Ck—p %k—p it P))- 

According to previous remarks, the correlation coefficient in the 
right member equals zero for any p > 0 . Observing that 

( 164 ) 7-(77 (f), 77 (f+p))=r (77 (f—p), 77(f)) = r (77(f), rjit-p))=0 

we conclude that the process {77(f)} is non-autocorrelated. 

Summing up the results, we get the following theorem. 
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Theorem 6 . A residual process {77 (0} obtained from a non-singular 
stationary process {£( 6 } is stationary and non-autocorr elated. The 
variable rj (t) is non-correlated with £{t — 1 ), £(f — 2 ), . . ., while 

r (| (fi, r] {t)) = D{r) (0)/2>(g(0). 

The arguments used in the proof of this theorem also apply in 
the remaining cases II and III. As the residual variables r] ( t ) are 
here seen to be vanishing, their correlation properties will be 
indeterminate. Accordingly, these cases need no further comment. 

Illustrations of the autoregression analysis of stationary processes 
will be given in sections 25 and 26 (cf. also p. 92). 


20. A canonical form of the discrete stationary process. 

In this section it will be shown that the residual processes arrived 
at in the preceding section give a basis for the construction of a 
canonical form of the stationary process with finite dispersion. 

Until further notice, the random variables considered will be 
assumed to have a vanishing mean. 

In the case of a finite number of approximations dealt with in 
section 18, the following representation holds for the variable § {0) 
subjected to the regression analysis (see formula (138)) 

(165) = V n) + a, r i) + + ■ • • + a„ g«>. 

Considering, on the other hand, the residuals tj (£; k) defined hy 
(153) and obtained from a stationary process { i (ft), it may be (cf. (148)) 
that the minimizing coefficients a Ot, k) are bounded in modulus by 
a constant not surpassing 1 /x. In such a case, a diagonal selection 
procedure will show that there exist a real sequence a L , a 2 , . . . and 
a sequence of integers k v k 2 , . . . such that for all i 

lim a (», k s ) = a,-, | a,-| < 1 /k. 

fi— ‘CO 

Hence, we are led to ask if for these coefficients at 
lim [t ] (ft + a y £(f — 1 ) + • • • + On £(t — n)] 
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exists and equals §(f), a relation which would correspond to (165). 
If the answer is affirmative, a non-singular stationary process {£(6} 
could always be written on the form 

(166) {rj it)} + a x {£ it - 1)} + a 2 {£ it - 2)} + . . . 


where {rjit)} is non-autocorrelated, and rjif) is uncorrelated with 
£(t- 1), 2), etc. 

However, as will be seen in section 26, certain conditions would 
have to be imposed upon {£(£)} in order to secure the representation 
(166). For the present we shall leave open all questions in this 
matter, and proceed to an aspect of the finite case suggesting 
another canonical form of the stationary process. 

Writing 

it is evident that the sequence \d-(t;k)} is convergent as 
Denoting the limit process by {^(f)}, we obtain 

{£(*)} = + Ml- 


Since &(t;k) is a linear expression in §(f — 1), . •, §(t — k), and thus 
uncorrelated with t]{t+n) for all n> 0, it follows that rj (t) is 
uncorrelated with the variables S(0, &(t— D, etc. So far there is 
a complete analogy with the finite relation (165). In further ana- 
logy, 0-{t;k ) can be written as a sum involving the uncorrelated 

residuals ’ V {t — 1 ; k — 1), . V « ' “ k + 1 ! 1} ’ and *» ( * “ *' ? 7 5 i*”®’ 
This circumstance suggests the question of whether {5- ©} is a linear 
expression in { V it - 1)}, { V (t - 2)}, etc. Were the answer in the 
affirmative, then {£(0} could always be written on the form (0) + 
+ ft {r, ( t - 1)} + b-i {r](t - 2)} + . . . where {rj (0} is non-autocorre- 
lated. It will be found that such a sum will not be sufficient as a 
canonical form for the stationary process — in general, nsmgu&r 
process which is uncorrelated with {rjit)}, has to be added 


in order to obtain {£(*)}• ^ 

After these introductory remarks, let {§(0} represent an ar l rary 
non-singular stationary process with finite dispersion o, and with 
zero for mean value. In the first place, let an approximation 
procedure be performed on £(0 by means of the residuals tj U, k , 
r,(t-n-k) given by (158). Denoting the new residuals by y>(t, n, k) 


we write 
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(167) %jj (t; n;'k)=-%{jb)'—b (0; n; Tc) - rj(t;Jc ) — b (n; n; k) • rj(t-~n;k). 

Since g(t) is non-singular, the coefficients b minimizing T) (i fj(t;n; k)) 
will be uniquely determined. 

The processes {xp(t;n;k)\ defined by (167) are, like {??(£;&)}, of 
type (61). The processes {ip(t;.n;k)\ will next be subjected to a 
repeated passage to the limit, first in respect of &, and then in 
respect of n . 

Letting k tend to infinity, and paying regard to the relation 
lim r(r](t—p; k\ r)(t — q; kj) = 0, 

k — »co 

which according to theorem 6 is valid for p 4= Q, we get (cf. (163)) 

(168) lim lip; n;Jc) = r{ § if), i] it — p)) / x = b p , 

fc—* OO 

independently of n. Thus, keeping in mind that the variable 
rj (£, t — 1, . ., t — n; Tc) tends to rj (t, t — 1, . ., t— ri) as k-+ «> , it follows 
that for all n and t 

(169) lim if) C t;n ; k) = £ (t) — rj (t) — b 1 rj (t — 1) b n rj(t — ri). 

k — co 

Let the limit variables thus obtained be denoted by xp(t;ri). Now, 
holding n fixed, the variables xp(t;ri) will obviously constitute a 
stationary process {ty(t;n)} which is the limit of the sequence 
{ip it;n; 1)}, {xf)it;n; 2)},... 

Keeping in mind that the variables rj it) are mutually uncorrelated, 
a short calculation shows that 

(170) D s (i p it; »)) = [ 1-(1 + « + •■•+«)• x a ] a®. 

Concluding that the series 2 1\ is convergent, let us write 

(171) X s = 1 + b\ + b\ + , 

and further 

(172) £ ( t ; ri) = rj (t) + b x • rj (t — 1) -f • • • + b n m rj(t— ri). 

Thus prepared, let n tend to infinity in (169). Since 2 1} con- 
verges, we have 

D 2 (£ (t; n 4- p) — £ (t; rij) — (6n+i + • • • 4- b h+ p ) • D 2 (rj) — ► 0 
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uniformly in p as n — > « . It follows that the sum 

+ b 2 ij (t — 2) H is convergent. Denoting the sum variable by 

£(0, we have 

(173) 5 (0 = lim C (t: n) — rj (ft + b, (t - 1) + b t v (t - 2) + • • • 

n— *oo 

Further, the variables 5(0 and xp (0 = lim \p (t; n) constitute two 
stationary processes {5(0} = lim and {^(0} = lim {ty{t;ri)\ 

n — -co n — *oo 

respectively. 

Observing that (173) yields 

(174) D 2 (5 (0) = (1 + Ji + bl 4 ■ •) • x 2 a 2 = x 2 • K* * a 2 , 

two cases may be distinguished: 

(A) x • AT— 1. Then Z) (ip (©) = 0, and 

(175) {5(0} = {5(0}; 

(B) x • AT < 1. Then J) (ip (0) > 0, and 

(176) {5(0} “{5(0} + {^(0}- 

Advancing that {xp(fl} is singular, it is seen that (176) covers 
both (175) and the cases II and III (see p. 81). Moreover, giving 
rjj(t) the same mean as £(0, the representation (176) evidently holds 
also in case {£(0} has a non-vanishing mean. 

Formula (176) is the desired canonical form for a stationary 
process with finite dispersion. As already pointed out, the variable 
5(0 corresponds directly with the case of a variable £ in a finite 
number of dimensions. Further, according to (164) and (173), our 
{5(0} presents a certain similarity to the general process {/(£)} of 
linear regression as introduced in section 15 y. However, {5(0} is 
still more general, for the variables rj (i t ) constituting 5 (0 are non- 
correlated, while those forming y (0 moreover are independent. 

Some characteristic properties of the variables 5(0 an <l ^(0 
appearing in the canonical formula (176) will now be proved, and 
the main results then comprehended in a theorem. 

Observing that 

(1 bp+i • bi) 2 < (S 6*+,) - (2 K) = JE 2 • 2 bl-> 0 as p -+ * , 

2—0 i =0 2=0 p 

GO 

we conclude that 2 bi • bi+ p is convergent for all jp, and that 
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(177) r p © = J5 [£ © • ; (f - p)] / x 8 • K* • ff a = 

== lim (b p + J p +i • 4 1- &„ • b n - p )l K* — 0 as p— » ». 


Paying regard to (163), we get further 

(178) r,® = (5 P + &p+i • 6 t + J p+2 • J 2 + • • -)/K* = 

= “V • lim »*(!©, i?(f — i?) + bj_ ■ rj(t — p — 1)+- • + b n ■ r] (t — p — »)) = 

= r (£ (0, ^(t—p^/x- K. 

Considering in the second place {i/»©}, we obtain from (170) 

(179) Z> s (i/>©) = (1 - x a • Z s ) • a 8 , 

Next, in case D(ip)> 0 we get from (169) 

r (i pit ; «); v (t+p))=r (f ©-j? ©— • 17 «— 1) &„ • q (*-„); ^ (*+ _p)). 

Keeping in mind that ij© is uncorrelated with any £(£ — p) and 
with any ij(t + p) for p 4= 0, and paying regard to (163), a short 
calculation shows that r (ip it ;n),rj(t + p)) = 0 for p > — n. It 
follows that for any p = 0 


r (ip ffi, j i(t + p)) = lim r (ip (t; «), ij a + p)) = 0. 

71— *00 

Hence the fundamental relation 

(180) r(ip(t\ £(f + j))) = lim r (^ ©, £(£ +p;»)) = 0, p JO, 

which shows that the processes {ip it)} and {£©} are non-correlated. 
Thus we have (cf. (59) and (119)) 


(181) 


n- ® = 


D 2 {ip) 

-D 2 © 


• n (.ip) + 


-P 8 © 
D- ® 


nt®. 


For the preparation of the remaining proof of the singularity of 
\ip(t)}, let it be observed that the non-correlation between nit) and 
V it + p) for p + 0 implies that for any real number a, 

£*(£©•«) + <h • C« — 1 ; »)) = D* (,© + («, + ^ . v (t _ I} + 

+ (a l -bt + bj-i] (t- 2)+(a, • & s + 6 S ) • tj«- 3)+ •••+«,. 6 n . M -l))= 

= -D 8 (ij © + I?! • jj (f — 1) + 1- _B n+1 . ^ _ j)) = 

= (1 H- + B\ 4- • • 4* Z?n+i) x s -a 2 > x 2 • o 2 , 
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the auxiliary constants JB m introduced being real. By the same 
argument we conclude that for any real a u a 2) . a p 

(182) 2) 2 (£(0 + a x • £tt - 1) + • ■ ■ + ■ £(< —p)) = 

= lim D 2 (£ it; rt 3 ) 4- a i ■ £ (£ — 1 ; wj) 4- • • 4 a p - £(t —p; nfi) x 2 • a 2 . 

00 

Considering now the variable §00 4- % • § (£ — 1) 4 • • + a p • § (£— _p), 
and paying regard to (176) and (180), a short calculation shows that 

(183) D 2 (£W + «!-£(*- 1)+ •• + a p - !; it — p)) = 

= D 2 (t// (t) 4 ci} 'tyd — 1) 4 • ■ 4 a P ' ip{t — jp)) 4- 

4- J5 2 (£ (0 4- • § (f — 1) 4 • • 4- % • £ W — y>)) . 

However, if an e>0 is arbitrarily given, (156) implies that a num- 
ber pis) and a real sequence, say ai = a*, = a p = a p , exist 

such that the left member of (183) is less than x 2 * cr 2 4- s. On the 
other hand, (182) shows that the second variance in the right 
member of (183) is not below x 2 • a 2 . Hence it follows that 

D 2 (i/j it) 4~ a* • ip it - 1) 4- • • • 4 a; • xp it - p)) < 6. 

Since s is arbitrary, this relation implies that {t pit)} is singular of 
a finite or infinite rank. 

Summing up, we have the following theorem in which one of 
the variables {^(0} and {£(0} may be vanishing. 11 

Theorem 7. Denoting by {§(0} an arbitrary discrete stationary 
process with finite dispersion , there exists a three-dimensional stationary 
process [ipit), £ it), pit)} with the following properties : 

(A) {§(0} = {^(0} + {£«)}■ 

(B) \ipit)} and {£(£)} are non-correlated. 

(C) \xp(t)} is singular. 

(D) {pit)} is non-autocor related, and E [p it) = E f£ it)] = 0. 

(E) {£(£)} = {17 («} + b r {pit -I)} 4 V {pit -2)} 4 

where b n represent real numbers such that 2 bl is convergent. 

Illustrations. In order to illustrate the autoregression analysis, let us consider 
a normal stationary process {§(6} as defined by the characteristic function (ill). 
Assuming for the sake of formal simplicity that m = 0, and that O’ = 1, we shall 
first investigate a sum variable £ [£] of type (61). 

Writing £ it, t — 1, . t — n) — [£ (t), £ (f — 1), . £ it — n)], we have by ‘de- 

finition 
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X(t) =| (t) + « t i) + •■• + «!(*—« 

J(f_l) = g(*-1) + «,£(*— 2) + •■• + <**£(< — h- 1) 

£(t-n) = !;(t-n)+ o 1 |(i— n — 1) + ■ • ■ + a K ^(t — n — h). 

According to the introductory remarks in section 14, the characteristic function of 

£«,f— 1,.., t-n), say f&Zt, Z t -i, ■ , Zt-n) = e~? Qn ’ will for all n#A be 
obtained from the characteristic function f n +h(Xt> Xt— i, • •> Xt— n — a) by the sub- 
stitution 


X t = Zt 

Xt — i = a 1 Zt + Zt — i 

X.f — fi cLfi Zt “1“ i Zt — i + * ' “I - Zt—h 

Xt—k — i — cih Zt—i 4* * * • 4- aj a + Zt—h — 1 

X«— n = +•■•+«! Zt—n+1 + Zt-n 

Xt— 7i—i = Xf—n+A— i + • * 4* a 3 ti41 *4 a 1 Zt—n 


Xt^n—h+l ~ 
Xt—n—h = 


Aft Zf— n-f 1 + 1 n 


We conclude that the distribution of f (£, — 1, . t — n) is normal, and that we 
obtain Qj from Q n +h by the substitution (184). 

According to the substitution theory of quadratic forms, the matrix defining Qn 
is obtained as follows (see e.g. Gr. Kowaiewski (1909), § 94). Writing A«(£) for 
the matrix of Q n , and Bh,n for the matrix of the substitution (184), we have 


An+h (§) = 


1 , 

r l , r 2 

, r n +h 

r l > 

1 , r . 

, r n +h— l 

r 2 > 

*•1 .1 

, nn-A-2 

'■Vn+h) 

*n+A— 1, rn+k— 2, • . 

, 1 


1, 

0 . 

0 , 

• . 0, 

0, 

• > 

0 


1 , 

0 , 

. . 0, 

0, 

• ) 

0 

ah, 

ak-b 

ah— 2 , 

. . 1 , 

o, 

. , 

0 

0, 

ah , 

ah- 1, 

. . rtj, 

i, 


0 

o, 

o 

• • • -> 

ah, ah— i, . 

• ■ , 

a lf 

1 

o. 

0 , 



, 0, 

ah, 

ah— l 

.0. 

0 , 



0, 

0 , 

ah j 
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In the first place we must form a product matrix consisting of n 4 h 4- 1 rows and 
n 4- 1 columns, and take for the i th element in the/** row the inner product of the j th row 
in © by the $ h column in B n ,h . Multiplying then Bh,n and the product 
matrix hy columns, we arrive at the matrix required. Denoting this by A n ©> 
and by M' the transposed of a matrix M, we have 

An © “ Bh,n * An+h © * 

Forming in the same way the infinite matrix 
(185) A<0 = & -A(§)'B 9 

where 

1,0 ,0 , 0 , ' 

a v 1 ,0 ,0, 

a 2 , a i ,1 ,0, 

Q>h) toll — l, toh — 2, • ■ to\i 1 » 0 , 0, . . . 

0 , CLfi , 0>h — i, CLh — 2, • • •, to^t 1 , 0, . . . 

0,0 , toh , toh 1 , to^y 1 , . . . 



1 , r„ r 2 , . . . . 

(186) 4© = 

T^y 1 J .... 

r 2> r l> 1 


it is readily verified that A n © equals the principal minor of order n 4* 1 in A ©>• We 
conclude that the distribution of £(t, t— 1, . . .) is normal, and that A{Q is the 
matrix of the infinite quadratic form Q* {Zt, Zt- 1 , . . .) appearing in the exponent of 
the characteristic function of £ (t t t— 1, . . .). 

Formally, the above procedure applies even in the case of an infinite sequence 

(a v a 2 , - . .). It is seen that if the double series appearing in the matrix <g* are abso- 

00 

lutely convergent, the variable £ (t) =* £ ai £(t — i) will be well-defined, and consti- 

i=0 

tute a normal process. The characteristic function of the variable £ (t, t — 1, . . .) 
will be given by e 1 Q*- 

Next we shall consider a few particular instances. 

Let (1, b v b 2 > . . .) represent a real sequence such that K 2 = 1 -4 £ b) is finite, 
and let {t){t)} be a normal and purely random process with vanishing mean, and 
dispersion equalling unity. Considering the variable £ (t) = rj it) 4- b x rj{t — 1)4- 
+ b 2 r]{t “ 2) + it is readily verified that the above substitution procedure 
gives A © = B f • A (rj) • B, where 


A(xj) = 

10 0.. 

0 10.. 

0 0 1.. 

, B = 

'1,0,0, 0, o,., 1 

b v 1,0, 0, 0, . . 
b}, 1 , 0, 0, . . 

, 4(0 = 

K 8 K* r x , K* r s , . .' 
K 8 r v K* X s r„ . . 




by, b 2 , 1, 0, . . 


K* r„ K s r t , K* ... 


having in the latter matrix written rjt for (bk 4- b t bk+i + b^ bk + a + • •) / K 2 . 
These results are seen to be in full agreement with formula (177), and with the 
correlation properties of a normal stationary process (see p. 62). 
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Considering in the second place a process {£(6} which is singular of rank h , 
there exists, by definition, a sequence (a lf .., ah) such that the relation (77) is 
satisfied. Accordingly, the infinite matrix A (£) = B' • A (£) * 5 formed by means 
of the matrices (186), will consist entirely of zeros. For instance, letting £ (£) *4* 
+ £ (£ — 1) = 0 be the relation of singularity, we have h = a i = l (cf. (107)). A 
short calculation will show that 


1,-1, .1,-1,..] 


1 0 0 0..' 


0 0 0 0..' 

1 

l— l 

J-* 

1 

y-i 


110 0.. 


0 0 0 0.. 

l, -l, i, -l, . . 

, 5 = 

0 110.. 

II 

Q 

0 0 0 0.. 

-i, i, -i, i, . . 


0 0 11.. 


0 0 0 0.. 



y 




H 11(6} is singular of infinite rank, there exists, for every integer n and every 
e>0, a number h{e, n) and a sequence a%(e,n) such that the variable £(f) + 
+ <*1 f(tf— 1) + — ] rahi;(t — h) will give rise to a matrix Qn, whose elements 
are all less than e in modulus. 

Proceeding to the operation of summing independent processes, let {£(f)} and 
{t//(f)} stand for two independent normal processes. Denoting the sum process by 
{£(£)}, and indicating all symbols referring to the three processes by £, £ and 'ifJ 
respectively, and paying regard to the evident fact that the characteristic function 
of 1,..) is the mathematical product of the characteristic functions of the 

variables £ (f, t — 1, . .) and xjj (t, t — 1, . .), we obtain jD 2 (£) • 0(B) = D 2 (£) • Q (£) + 
+ D s ty)- Qhfj), i.e. 

D * © • f f r\P-< I © • Xt-pXt-g = -D 2 © • 2 2 r|p_ ? | (£) • X ( _„ + 

0 0 0 0 


+ P 2 (t//)-2S r |p — </| (^) * Xt-pXt—q. 
0 0 

In full agreement with (59) we obtain the relation (181), and observe that mutually 
uncorrelated normal processes are always independent. 

The simple types of normal process mentioned above are sufficient to illustrate 
theorem 7. Starting from a purely random normal process {rj (f)} with suitable 
dispersion, forming a sum process j£(f)} of type {t] (6} + b t {rj (t— 1)} + 
a i ^7 2)J -f . . . , and adding an independent normal process {xjj (f)} ruled by 

an appropriate singularity, we shall arrive at an arbitrarily prescribed, normal and 
stationary process {£(£)}. 



CHAPTER III. 


On the theory of some special stationary processes. 

21. On the concept of stochastical difference equation. 

The relation 

(187) {£(*)} + a r {£(t- 1)} + ••■ + «*•{£(*- h)\ = 

arrived at in section 15 d presents a formal analogy with an 
ordinary, or functional, difference equation 

(188) x(t) -f a x • x(t — 1 ) H + ah‘ x{t — h) = y(t ). 

We have found under certain conditions concerning the coefficients 
a,i , and in case the process { 77 ( 6 } is purely random, that there 
exists a stationary process {£(£)} which satisfies (187) and is of type 

(189) {£(«} = {*?(*)} + &i ■{!?(*- 1)} + + ... 

On the other hand, since under general conditions a solution of 
(188) will be of the form 

(190) x(f) = y{t) 4- b x y(t— 1) + b 2 'y(t — 2) + 

a clear formal analogy may be seen between (187) and (188) also 
in respect of the solutions. 

Expressing the situation in words, a solution of type (190) of a 
functional difference equation is a moving average performed on 
the function y if) in the right member. Correspondingly, any sample 
series (£*, 1 , .. .) connected with a process {£(£)} of type (189) may 

be looked upon as a moving average of a purely random series 
tyu yt- 1 , . . . ). 

Because of this parallelism, I propose to call (187) a stochastical 
difference relation between the processes {£(£)} and \rj (ft}. If 
is known, and {£(£)} unknown, (187) will be termed a stochastical 
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difference equation . An interpretation, in the language of the theory 
of oscillatory mechanisms will reveal some interesting connexions 
between functional and stochastical difference equations, and exemplify 
the wide applicability of the new concept. 

An oscillatory mechanism presents certain intrinsic features 
relevant to the structure of the movement considered. Studying 
the movement in integral time points, these features are summed 
up in the relation 

(191) x(f) + a t • x(t — 1) -f- • •• + an, • x{t — h) = 0. 

Interpreting (191) as an ordinary difference equation, the solutions 
(see section 6) describe how the phenomenon would develop out 
from any initial values, say x(t— 1) = xt—u • • •> %(t — h) = octr-h , if 
there were no external influence present. 

In the ordinary difference equation^, the external factors are 
taken into account by means of the function <y{t). Thus, instead 

of the value — a x • xt — i — a % • Xt-2 — ah * xt—h to be expected 

for x{t) when the earlier values are known, the variable in question 

takes on the value y (0 — a t • Xt-i ah • Xt—n. In this approach 

the external influence is dealt with as functional, i. e. uniquely 
determined at any future time point. 

The stochastical approach differs in the allowance for the external 
influence upon the mechanism. Here the external factors are not 
dealt with as functionally determined; they are only assumed to 
be ruled by certain probability laws. These laws constitute the 
stochastical process {77®}; in general, the probability laws are 
subjected only to the conditions (53) — (54) which express that 
the laws must not contradict themselves (cf. p. 3). The simple 
case investigated in section 15 d corresponds to a purely random 
effect of the external factors, but nothing prevents us from ap- 
proximating the external influence by a non-purely-random, or even 
a non-stationary process {77(0}. 

Having fixed a {rj(f)}, any sample series (. . ., 77*— 1, r\u 174+1, . . .) 
will describe an actual realization of the external development. 
Since the movement of the mechanism is known when the external 
factors are determined, the sample series (. . rjt-u 974, 17**1, . . .) 
considered will correspond to a certain sample series, say (. . ., i, 
6+i» • • •)» of the process However, as we possess only 

probability knowledge of the actual path (. . ., rjt-i, i} U rjt+u - . .) of 
the external factors, we can reach only probability laws about the 
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behaviour of the oscillatory mechanism. These probability laws 
concerning 1 , &+ 1 , . . .) constitute the process {£(£)}, and 

form a solution of the stochastical difference equation. 

By means of the probability laws found for the mechanism, it 
will be possible to give information as to the average behaviour of 
the phenomenon considered, i.e. as to the expectations referring to 
{g(f)}. It should be observed that we cannot say in advance that 
the conclusions as to the average behaviour will be identical to 
those drawn from the functional difference equation (191). Ne- 
vertheless, it has often been argued — more or less explicitly 
that any intrinsic tendency of the mechanism to produce periodic 
oscillations will, on the average, give rise to a corresponding 
oscillation in the phenomenon when influenced by random shocks 
(see e. g. Sir G. Walker (1931), p. 522, and R. Frisch (1933), p. 202). 
However, the following analysis will show that there are important 
instances when such inference based on analogy is incorrect, 
qualitatively as well as quantitatively. For instance, let C • ' coq 

(^ • t + q>) represent a solution of (191), i. e. a damped oscillation 
characteristic of the mechanism, and consider the simple case of 
purely random external shocks. Then, even if there is no other 
intrinsic tendency to oscillation present, a periodogram analysis for 
the search of the frequency will be more or less misleading. 
As will be shown in section 25, there is in general a systematic 
deviation between and the abscissa for which the expectation of 
the periodogram ordinate presents a maximum. It may even happen 
that there is no maximum at all in the neighbourhood of 

As soon as external induence cannot be considered free from 
random elements, the stochastical difference equation should be 
preferred to the functional equation (cf. the quotations from G. XT. 
Yule (1927) in section 10). It is also obvious that the former 
embraces the latter as a special case, for any function y(t) may be 
interpreted as a singular random process. Thus, the stochastical 
difference equations seem to merit particular interest. 

When omitting all dispensable conditions as to \rj{t)} and the 
at s, the solutions of the stochastical difference equations become 
of a very general type, and embrace fundamentally different classes 
of random process. Having already seen in section 15 5 that the 
solutions cover the stationary processes of linear autoregression, 
let us in the second place consider the special equation 

(192) {£«)} -{£(*- D} = 
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taking as before a purely random process for {rj(t)}. The solutions 
of this equation are seen to form a type case of the discrete homoge- 
neous process (see e. g. H. Cramer (1937), Ch. VIII). In sharp 
contrast to the stationary processes, the oscillations here tend to 
increase in amplitude as time goes on. We may express this fact 
by saying that the homogeneous process is evolutive (cf. p. 1). In the 
particular case (192), the process {£(6} cannot be assumed to have 
been in movement during an infinite past. Accordingly, and in 
contradistinction to the stationary case, the analysis of this equation 
generally has to be restricted to an interval of type (f 0 ^ t ^ oo). 

As already pointed out, there are many problems calling for 
investigation in connexion with the general stochastical difference 
equation. The coming section is reserved for some groundwork 
concerning such equations. In accordance with the program of the 
present study, non-stationary solutions will be dealt with only very 
briefly. 


22. Some fundamentals concerning stochastical difference equations. 
According to the definition given in the previous section, 

(193) {£(*)} + a t • {§(« - 1)} + • ■ + a k • {£(f - ft)} = {rj (0} 

forms a stochastical difference equation in {§(£)} if the coefficients 
at are real, and if {??(£)} is a discrete random process. If an 4= 0, 
the equation will be termed of order ft. 

Let first an equation of order ft with vanishing right member be 
considered, 

(194) {5(0} + a x • {£(f - 1 )}+■■•+**• {§(* - ft)} - 0. 

If there are any solutions to this equation, these will be singular 
in the sense indicated in section 14, and have (77) with m = 0 for 
relation of singularity. Being particularly interested in stationary 
solutions with finite dispersion, it follows from the analysis in 
section 14 that there exists a non-vanishing stationary process which 
satisfies (194) if, and only if, the characteristic equation (34) has 
at least one root on the circumference of the unit circle. 

It in seen that if {£(£)}* and {yj {t)} are stochastic processes 
such that {§(£)} is a solution of (193), while {ip(ft} satisfies (194), 
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then {£ (6} + {yKfl} will satisfy (193). This property of the stoch- 
astical difference equation forms another analogy to the functional 
case. 

Secondly, we shall touch upon the case when the variable {17W} 
appearing in (193) is of the type { r](t 0 ; 6} = [..., 0, 0, rjit^, rj(t 0 + 1), 
rj (t 0 + 2), . . .] considered in connexion with theorem 1. It is evident 
that this equation has one, and only one, solution of type {§(V> 0}= 
= [. 0, 0, £(£<>), * •]» and that this solution is given by the 

following system, 

? «o) ^ V 

£ (t 0 + 1) - rj (t 0 + 1) ■ - a x • | (t 0 ) = j] (t 0 + 1) 1 - a, • ij 

‘ %(t 0 + 2 ) = rj(t 0 + 2)—a 1 • £(£<> + 1 )“ a 2 * IW = r){t 0 +2) — a 1 - 1 ] (f 0 + D + 

■+(a 1 2 — a 2 ) • T)(t 0 \ 


The general variable £ (f 0 + ft is evidently of type 
(195) £ {to + f)=rj (to + 1) + • rj (t 0 + 1— 1) + 6 a • ^ (£ 0 + 1—2) H + • r] (tj. 

The coefficients 6* introduced are seen to be identical to those 
used in section 15, <5. Thus, 6 X , 6 2 , . 6* will be obtained from the 
system (97), and the following ones from the difference equation 
(96). The following elementary theorem concerning the coefficients 
bi will prove useful. 12 

Theorem 8 . The series bt defined by (96) and (97) does not satisfy 
any difference equation of type (32) of lower order than h. 

Writing b t on the form (33), let us examine the values, say b{t), 
of this analytical function taken on for t 5^ 0. Since every linear 
difference relation which is satisfied by this function for t > 0 must 
hold also for t < 0 and vice versa, it is sufficient to verify theorem 
8 for t < 0. According to the difference equation (96), we get 
m~ 0 and 




a x • 6(0) + a 2 
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Identifying the left members of the h th equations in the two systems 
(97) and (196), we obtain an - 6(0) = an. Since an 4= 0, this relation 
gives 

(197) 6(0)= 1. 


Inserting 6(0)= 1 in the system (196), the (h— l) th equations of the 
two systems considered give an • 6(— 1) = 0, or 6(— 1) = 0. In the same 
way we obtain successively 

(198) 0 = 6( — 1) = 6( — 2) = • • • = 6( — A H- 1). 


Now, if bt also satisfied a difference equation of lower order than 
A, it would follow from (198) that 6(0) = 0, which contradicts (197). 

We conclude from the theorem just proved that none of the 
individual components will be identically vanishing when bt is 
written on the form (33). Thus, according to section 6, the two 

series LJ 6, • | and will be convergent if, and only if, all roots 

of the equation (34) are lying within the boundary of the unit 
circle. This corollary is important to the following. 

Recurring to the solution {£(£ 0 ; 0}, let the case be considered 
when the variables i\ (0 are independent, and have identical distribu- 
tion functions. Then we obtain from (195) 


D s (f (£ 0 ; 0) = D 8 (rj (0) • [1 + V + V + • ‘ 

Since D(j ? 1 )>0, the process {£(£ 0 ; 0} thus will be evolutive if the 
00 

series is divergent. According to the above, divergence takes 


place if one or more of the roots of the characteristic equation are 
lying on the boundary of or outside the unit circle. 

The process of linear autoregression is, by construction, a solu- 
tion of a stochastical difference equation such that (A) the variable 
in the right member is purely random, and (B) all roots of the 
characteristic equation are of a modulus less than unity. Leaving 
aside the question of whether there are other equations with 
stationary solutions — it seems likely that certain equations in- 
vo ving a singular and stationary { tj (0} and where all the roots of 
the characteristic equation are of modulus unity are satisfied by 
suitable stationary and singular processes {£(0} — this short in- 
troduction will be terminated by the proof of the following theorem, 
which shows that the condition (A) can be generalized in as much 
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as it is sufficient that the right member is stationary. In the language 
of the theory of oscillatory mechanisms, the theorem states that a 
mechanism whose intrinsic movements are damped will give rise to 
a stationary oscillation when influenced by external shocks of a 
stationary kind. 

Theorem 9. Let (193) be a stochastical difference equation such 
that all roots of its characteristic equation are of a modulus less than 
unity , let {rj(t)\ be stationary and have a finite dispersion , and let the 
sequence b ly J> 2 , . . . be given by (96) and (97). Then 

lim [{ V (t)} + b l • {rj(t- 1)} + b 2 • {rj(t~ 2)} + ■■■ + l n • { V (t- n)}} 

n -*■ oo ' 

will exist , and form a stationary solution of the equation. 

Observing that for any m > 0 

D 2 (b n ' f](t — n) + i > n + 1 ■ rj (t — n — 1) + ■ • ■ + b n +m ■ rjit — n — nij) < 

^ (| b n \ + | &n+l | H Y | b n +m\Y * D* (f} (fj) — » 0 as U CO, 

we conclude in the first place that 

lim [tj it) + b x • rj (jt — 1) + b b n ‘ ^ it — /?)] 


will exist. Denoting the limit variable by §(£), an application of 
theorem 1 shows that the variables §(f) arrived at will constitute a 
stationary process {^(0}. That this process satisfies (193), may be 
proved in the same way as the identity (100). 


23. On the stationary processes with finite dispersion and with no 

singular component. 

When writing a stationary process with finite dispersion on the 
canonical form (176) it may happen that the singular component is 
vanishing (cf. section 15 y). In this section, we shall derive some 
general formulae concerning such processes. In the coming sections 
of the present chapter, we shall use these formulae, and the previous 
analysis of stochastical difference equations, for a detailed study of 
the processes of moving averages and of linear autoregression. 



100 


ANALYSIS OP STATIONARY TIME SERIES 


[m 23 


Let {£(©} and {ij(6} represent two stationary processes such that 

(199) £tt) = if(© + b 1 -r 1 (t-l) + b s -r ] (t-2)+ . ... 

(200) 1? (t) = £(e + a 1 -£(#-l) + a 8 -£(t-2) + . ... 

where 

(A) {i)(f)} is non-autocorrelated, 

(B) D [r] (0) > 0 is finite, 

(C) JE[ v m = 0, 

(D) the sum 2 6* is convergent. 

Thanks to the convergence of 2 b\, the formulae involving only 
the coefficients Ik will all have a real meaning. On the other hand, 
the coefficients ak have been introduced in a purely formal way, 
and all questions concerning their existence will be left open for 
later treatment in connexion with the analysis of special cases. As 
a matter of fact, the expressions which involve the coefficients 
ak will be used mainly as a formal comprehension of the special 
cases of moving averages and of linear autoregression. 

Replacing t in (199) by t — 1, f — 2, . . ., and inserting in (200), 
we obtain the following relations between the coefficients at and 

(201) ak + b l ■ ak—i H + i * a x + b k = 0, £=1,2,... 

If the set to) is given, the set (&*) thus will be uniquely determined, 
and vice versa. We obtain for the first few coefficients 


( 202 )^ 


a i — fyo 

a 2~ b 2 + &ij 

a 3 =— & 3 + 26 1 ll 


4 & 2 + Z >2 + 





~ + 

— + 2 a x a s — 3 «? 4- a\ -I- a]. 


Keeping the notations used in section 19, we write 
(203) K* = 1 + ll + ll + . . . ., 

and remark that the vanishing of the singular component implies 
that K = l/%. Thus we have 


( 204 ) D(Q = K'D(r)\ 

while the general formulae (163) and (177) reduce to 
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K-r(£(t + n)\ rjifj) = 6 n , 

(205) rk = v \ t (£) = (6 a 4- b ± • 6a+i 4- 6 2 ‘ bk + 2 4- • • *)/ -S’ 2 , 
writing shortly n- for r*(£). 

Next, we shall derive a fundamental set of relations between the 
sequences (a\ 0 b ) and O). Multiplying the two identities 

(206) £ (£ 4- 6*) = y (t + s) 4* • 7] (t 4- s — 1)4- 6 2 • rj (t 4- s — 2) 4- • • • 

(207) £«) + a r Z(t-l) + — 2) + --- = 

and forming the expectations of the resulting two members, we 
obtain in the case of a negative s in (206) 

(208) rjt 4- a 1 • r*— 1 + • • • 4- a*— 1 • 4- at 4- 0 a+i * ^ 4- «a +2 • r 8 + • ■ • = 0, 

for all Jc > 0. In the same way, taking ,9 > 0, we get 

(209) (1 4- flf r x 4- a 2 r 2 4- • • •) ■ D * © = D 2 ( rj ), 
and 

(210) rk 4- a x • Vk+i 4- a 2 • r*+ 2 4- • • * = 6a-/ -S 2 , & — 0. 

Using the relations (149) — (152), we shall next derive a set of 
forecast formulae. For this purpose, we shall consider the variable 
£(£4-&) as conditioned by the development of the process up to the 
time point t inclusive. Writing 

(211) FcE« + 4)] = J?Uf(t + *)] f 

where (O = (ftf - Jc) = &_a, y (t - Jc) = rj t -k] Jc = 0,1,2 ,...), we 
have first the formal relations 

( 212 ) ^ + ^ + ’ 7?£_fc_2 + ^ = 0 , 1 , 2 , . . . 

l^-A = — a 4- • &-A-1 + 4- ■ ■ • ; Jc = 0, 1, 2, . . . 

Since the variables y (t) are uncorrelated, (150) gives the linear 
forecast 

(213) Ft[£(t 4- i)] = 6 a * yt 4- 6 a+i * rjt — 1 4- 6 a +2 * ^—2 4- ■ ■ ■ ; ^=1,2,... 
As verified below, we have further 
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2 14) Ft Z (* + ®] = - a r Ft g « + i - 1)] — a, • F t g (* + * - 2)] - 

“ ' * ' — 1 • i'i g (i + 1)] — dk • £< — #Jfc+l * £*— 1 a &+2 * £*—2 — ' 

Phis relation, which makes possible a successive calculation of the 
inear forecasts Ft , reduces to (213) if every g(£ + k — i)] is 
written on the form (214), and every is expressed in the values 
)«—i by means of (212). In fact, for i ^ 0 the coefficient of 77c — / 
hen becomes 

— • lk+i—i — * ftfc+f— a 1 * 61 — dk+i, 

nd according to (201) this expression equals fa+i- 
Alternatively, we may express the forecasts Ft in terms of the 
alues %t-£. Writing 

J15) Ft g (t 4- k)] = fk, 0 * £* + fk, 1 • £t— 1 4- f k , 2 • 2 + ■ • • , 

r e record in the first place 


fl, i = — 


urther, inserting (215) in the left member of (214), and writing 
.so the forecasts in the right member on the same form, we 
btain 


16 ) 


[A 0 + ^1 * A-l, 0 + a 2 • /*— 2, () + ■••+ A *— 1 • A, 0 + Clic — 0 , 

A 1 + «i *A-i, 1 + « 2 *Am, ! + ■•■ + - A ! + fljfc+i = 0, 


bus, after having calculated the coefficients fk—tj appearing in the 
recasts Ft g(£ 4- 1c i)\ the relations (216) yield the coefficients 
j necessary for computing F t Z(t 4- k)] in terms of the s. 

The relations (213) (215) will be referred to as the forecasting 

''mulae. The sample series sections (&, Cm,...) or (and) fy, 

-1, 2, . . .) being given, these formulae furnish the best linear 

recast as to the future development of the series, viz. in the 
Ho wing sense. 


A particular sample aeries section, say ((/ = (£<, . , .) 

mg given, there exists a set of constants/*, 0 (O, /*, 2(6';, . . . 

mmizing the expectation 


■7) E [[Cc it + 4) — /*, 0 ( 0) • £, — /*, 1 (0) • 
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In general, the constants /*,<((/) will depend on (CO, and differ from 
the fk,i s. Now, considering on the other hand the expectation 

E Ucit + ft) - Ao • & -A 1 • &-1 - A 2 • Cm - • * ■ ] 2 ], 

our A* 8 possess the property that the (weighted) average of 
this expression based on all sets ( O becomes a minimum. This 
minimum is 

(218) (1 + b\ + b\ + ■ • • ■+■ ll— i) ■ j D 2 ( rj ), 

a formula which shows clearly the scope of the forecast method 
under consideration. A forecast Ft [£ it + ft)] over ft time units is 
decreasing in efficiency as Jc increases. As ft— >oo, the expression 
(218) tends to E 2 * D 2 (rj) = D 2 (£(£)). In other words, for large ft-values 
the forecast Ft [£ ( t 4* ft)] is approximately of the same efficiency as 
the trivial forecast E[£(t 4- ft)] = E [£ (0] = 0. 

If {rj(t)} is purely random, it follows from (152) that we have 

+ + ft)]. 

In this case, the coefficients fk,dO appearing in (217) are independent 
of (O, 

AiiO=fk,i. 

It is seen that (213) — (215) then give for every ((7) a forecast which 
is the best one according to the principle of least squares. 13 


24. On the process of linear autoregression. General developments. 

As already mentioned, we shall in the present section investigate 
in some detail the process of linear autoregression as dealt with in 
the sections 15 d and 22. Denoting the process by {£(£)}, the de- 
fining relation will be written 

(219) {£(0} + «,{£»— 1)} + ••• + a h {£(t — h)} = { 77 (f)}. 

It will be observed that the process {?}(£)} need not be purely 
random; the following developments are valid under the broader 
assumption that the stationary process {yit)} is non-autocorrelated. 
As before, we shall assume that E[rj(t)] = E[£(tj\ = 0. 
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The formal developments given in the previous section are all 
vahd m the present case. In fact, in the sequence « we have 
an 0 for n>h, so the serial developments are only apparently 
infinite. In order to arrive at more precise knowledge, we shall 
next consider these developments in some detail. 

oo A V°i. the irSerieS ’ we alread 7 know from the analysis in section 
ii of the general stochastical difference equation that b t in the 
present case is of type (33), and that the oscillations are damped, 
an t at the series does not satisfy any linear difference equation 
of lower order than h. 

Formula (209) gives 

(220) (1 + a t r L + ■■■ + a h r h )- D i (g) = D*fy). 

/Oioi u° the autocorr elation coefficients, we obtain from (208) and 
(210) the following three groups of relations. 


(221) 


( 222 ) 


(223) 


r* + a x • n_i 4- a s ■ n-_ 2 + • • • + • r*_ A+1 + a h ■ r k - h = 0 

r h+ i + a x • r K + a 2 • r*_ x + • ■ • + a A _ 2 • r s + a A -i • + a h ■ r x = 0 

n + a.-r^ + a 2 -n - 2 + --+a A _ 2 -r 2 + a A _ 1 -r 1 + a A = 0 

n-i + a x • r A _ 2 + a 2 ■ r A _ 8 + • • • + a A _ 2 . r , + a A _, + o A • r x = 0 

r i + «i + a* • r x + • • • + a A _ 2 . rA _ 3 + . n _ 2 + Uh . = 0 

1 + + a 2 • r 2 + • • • + a A _ 2 . ,- A _ 2 + ah _ x . rh _ x + a h -r h = ] /#* 

+ % • >*8 + a 8 ■ r s + ■ ■ ■ + dh—i -r h + a h - r h+1 = bj K* 

Vk + Ul ‘ n+1 + a * ' r *+* + ' ■ • + 1 • n+i-i + a A • r A+A = 6 A /X 8 


The first group is given in the paper of Sir G. Walker (1931) 

" *" ed h >- We the some paper the observe- 

turns that (82) constitutes a difference equation satisfied by the rr 
senes tor ii h, that this relation is the same as that satisfied by 

fA SCrieS * hM P"* 6 ”* o»oillatio„s 

of_ type (33 ). Sir G. Walker mentions further that the observa- 

* f -TT ((1936) ’^ 35) . giVeS the rektion which is correct only in 

ZL~ h * tl ° n ° f the eXpeCtation *^°ns (127) is based on 
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tion that the ^-series satisfies the difference equation (32) has al- 
ready been given by Gr. U. Yule (1927) for the special case h— 1. 

The second group (222) contains h — 1 relations, and involves the 
autocorrelation coefficients r ls r 2l . rA— i. Now, we obtain the 
coefficients r 1? . rh — i directly in terms of the coefficients at by 
solving the system (222). This important fact may be regarded as 
a corollary to the following theorem. To see this we assume to the 
contrary that equations (222) were connected by a linear relation, 
say with coefficients Ah-i. An inspection of (222) with 

regard paid to (224) — (225) then shows that the r* would satisfy a 
difference equation of order h — 1, and this would contradict the 
theorem. 

Theorem 10. Let {£(£)} be a process of linear autoregression of 
order h. Then the autocorrelation coefficients of {£(£)} satisfy no 
difference equation of type (32) of lowei • order than h. 

For the proof, let us write r* on the form (33), and consider the 
values, say r(A), taken on by this function for k — 0, — 1, —2, 
According to (177), we have m =■ 0. Next, comparing the relation 

rh + a>\ ■ rh — i + • ■ ■ + ah—i ■ r 1 + au • r (0) = 0 

with the last equation in the system (221), we find a/r r{0) ~ a^. 
Since ah + 0, we conclude 

(224) r(0) = 1. 

Comparing further the first equation in the system (222) with 


rh — i + • rn— 2 + ■ ■ ■ + an— 2 m r± +■ i * r WW 4- cih • r ( 1) — 0, 


and paying regard to (224), we obtain ?*(— ■ -1) = r v This procedure 
may be continued h — 1 steps, which gives 

(225) r (_i) = ril r(— 2) = r a , . . r( — h + = 

Proceeding one step further, we arrive at the first equation in the 
system (223), and obtain ah-r(—h ) + -j— =ah‘ r h , from which we 
conclude 

(226) r(— W + n. 
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Now, considering the function dh defined for i = 0, i 1, +2, 
...by 

(227) &-r(i)-r(-l), 
the relations (224) — (226) yield 

(228) d—h + 0; d—h+i = d—k+2 =■-•- = d~i = d 0 = = * ■ ■ = tZ/i — i = 0 ; 

dh 4 s 0. 

On the other hand, it follows from (227) that dh satisfies a difference 
equation of type (32) of an order twice that of the equation satisfied 
by Vk. However, by the argument used in theorem 8 it follows 
from (228) that dh cannot possibly satisfy an equation (32) of lower 
order than 2 h. Thus r* will satisfy no equation of type (32) 
of lower order than A, a reflection which completes the proof. 

Among the general developments in section 23 there remains for 
consideration the forecasting formulae. Since = 0 for Jc > h, 
formula (214) shows that the forecasts (f + 1)], F t [£(t + 2)], . . 

Ft[£(t + i)], concerning the time points t 4- 1, t + 2, , t + A, 

• • •> forecasts based on the development up to the time point t, 

will satisfy the equation (32) in respect of Jc. Thus, the forecasts, 

too, will form a damped oscillation. Explaining in the terminology 
of the theory of oscillatory mechanisms, the forecast curve describes 
how the mechanism would develop out from the situation arrived 
at in the time point t if there were no external influence pre- 
sent in the future time points £+1, t+ 2, ... As by hypothesis the 
intrinsic oscillations of the mechanism are damped, the forecast 
curve will also be damped, in full agreement with the above. Thus, 

Fi&tt + £)]-+ 0 ==£[£©] as h oo, in agreement with the concluding 
remark of section 23. 

Considering, finally, the relations (216), it will be observed that 
the coefficients / M for all i satisfy the difference equation (32) in 
respect of h. 

Next, we shall illustrate some points of the general analysis in 
hapter II by means of the process of linear autoregression. 

Keeping in mind that in the present case £ |n| is convergent, 

we may apply the corollary to theorem 5 (see p. 69). We conclude 
that the generating function W(x) for all x in (0, n) has a bounded 
erivative given by (121). In order to transform this expression 
upon a finite form we write 
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G Car) = S r k • e iix = <?‘ x • 2 r t +, ■ e itx . 

k—0 {=— * 

Considering the identity 

G (x) ■ [e~ lhx + a t • e ~ i{h ~ l)x + 1- a h -i ■ e~ ix + a h ] = 

00 00 

= 2 rt+h • $ tx + a i • S rt+h — i • e itx + • • • H- 

00 00 

+ ah — i * S n +1 * e* 7 * + flrSn* e itx , 

<=— i <=o 


and paying regard to the relations (221), the right member re- 
duces to 

e —ihx _j_ ( ( a i .f. ri ) . e —i[h—l)x ^ ri -f rg ) . ^ »'(A 2)a + . . 4 

+ (tth—1 + 2?\ + * * + Vh— l) ’ e~~ lx . 


Thus we obtain G(x) on the finite form 

GC )= h (gft-! + a h -2 fj -f • ■ +a t r A ~ a + rA— i)- e i{h ~ 1)x , 

1 + a x • e 7 * + a 2 • e 27 * + ••• + «/*• e 27 ** 

while TF'fe) is given by 

(229) W'{x) = G (x) + G (- x) - 1 = 2 • B[Ste)] - 1, 

where 22 [6? (a?)] stands for the real part of G (x). 

Denoting by Xi the roots of (34), the roots of 1 + a 1 aj + -*-H- 
4- ah x h = 0 are seen to equal 1 1 Xi. Since these roots are lying 
outside the periphery of the unit circle, the denominator of G{x) 
is evidently non-vanishing, which is in full agreement with the 
earlier observation that W'(x) is bounded. Now, paying regard to 
the relations (127) and (129), and summing up the main results, we 
obtain the following theorem. 

Theorem 11. The generating function W(x) of the autocorrelation 
coefficients in a process {£(6} of linear autoregression is absolutely 
continuous , and has a bounded derivative W' (x) given by (229). The 
expectation E[C 2 (n ; h)] of an arbitrary ordinate in the Schuster 
periodogram , as defined by (126), is given by 

d T) 2 (E) 

E[C*{n ; l)] = • W'( A) + 0(1/*). 

n 
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It should be noticed that the expectation is of the same order 
of magnitude as in the case of a purely random series (cf. (37)). 
When extending the analysed series, the periodogram ordinates 
thus tend to vanish. 

In view of the applications of the theory of linear autoregres- 
sion, it is of fundamental importance to investigate the possibilities 
of drawing conclusions from a sample series (£*, i, ft- 2 , . . .) upon 
the characteristic equation (191), or — interpreting in the language 
of the theory of oscillatory mechanisms — to investigate whether 
the past development of the mechanism as influenced by random 
external factors can give any information about its intrinsic oscil- 
latory tendencies. In particular, is it possible to find out the 
periods of the intrinsic damped oscillations? 

The previous analysis shows that the classical periodogram ana- 
lysis is an inadequate method for the search of intrinsic oscillations 
— the longer the series analysed, the poorer the results. This con- 
clusion holds both for the Schuster and the Whittaker periodo- 
grams since they are of equal efficiency (see section 8). In the 
illustrations given in the next section, these periodogram questions 
are dealt with in more detail. 

As has been emphasized by G. U. Yule (1927) and Sir G. Wal- 
ker (1931), an adequate tool for the search of the intrinsic pro- 
perties of an oscillatory mechanism is yielded by the serial coeffi- 
cients of the time series investigated. In fact, a serial coefficient 
?k approximates the corresponding autocorrelation coefficient n-, and 
we know that the graph of r* presents exactly those damped oscilla- 
tions which are characteristic of the mechanism considered — there 
is conformity in respect to both the frequencies of the individual 
components of the oscillations and their damping exponential fac- 
tors. Since a periodogram analysis is concerned with only the 
intrinsic frequencies, it is of particular importance that the serial 
coefficients can give information even about the damping factors. 

Having above derived expressions for the autocorrelation coeffi- 
cients and other characteristics connected with a process of linear 
autoregression, we are, in view of the applications, confronted with 
problems of an inverse type. In particular, it is seen that if the 
autocorrelation coefficients of a process of autoregression are known, 
the coefficients {a) will be obtained from the system (221 — 222). 
After having derived the coefficients (a) we obtain the primary 
process {rj(t)} in terms of the process considered by means of form- 
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ula (200). Concluding that the inverse problem mentioned involves 
no difficulty in point of principle, reference is given to the next 
section and Chapter IY for illustrations. 

A question now presenting itself concerns the reliability of the 
information yielded by the serial coefficients. Here we meet at 
once the same obstacles as in all significance problems concerning 
autocorrelated time series. In the first place, we notice that when 
forming the sampling variance of a serial coefficient, we arrive at a 
complicated expression involving i. a. an extensive sum of correlation 
coefficients between different serial coefficients. The difficulties 
of the problem having already been mentioned by 6. U. Yule 
(1927), E. Slutsky (1934) has presented a large collection of for- 
mulae concerning the dispersion of various characteristics derived 
from sample series sections, formulae deduced under the assumption 
that the variables considered are normally distributed. 

However, it should be observed that the relevant problem does 
not consist merely in calculating the variance or the distribution 
of a single autocorrelation coefficient. We have also to face the 
much deeper question concerning the reliability of the periodicities 
which present themselves in the graph of the serial coefficients. In 
view of the complications already occurring in sampling problems 
involving merely one individual serial coefficient, the possibility of 
arriving at a practicable, quantitative measure of significance in 
this connexion seems, at least for the moment, hopeless. 14 

There is also another essential difficulty. Correlation in time 
series, and correlation as considered in the classical applications, 
differ as to their quantitative significance (see H. Wold (1936)). As 
a matter of fact, in the former case the correlation coefficients are, 
as a rule, quantitatively conditioned by the size of the statistical 
masses to which the coefficients refer. In order to give an example, 
we advance that certain business cycle data yield nice graphs of 
serial coefficients (see Chapter IV). For instance, the serial coeffi- 
cients of the G. Myrdal index 1830 — 1913 of the cost of living 
in Sweden show a clear damped oscillation (see e. g. fig. 14). Let it 
now be imagined that another index had been computed by the 
same method, covering the satne period — 84 years — , but refer- 
ring merely to a small part of Sweden. In the second case there 
would, of course, be a larger random element present in the index. 
A short reflection will show that the increase of the random element 
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is accompanied by a systematic tendency to a diminishing of the 
serial coefficients. — Consider, on the other hand, a classical 
application such as the correlation between cranial indices. Here 
the statistical units, the skulls, are uniquely determined, unmodificible 
— in a material of say 84 skulls the correlation coefficient can be 
calculated in only one way. In other words, while we cannot 
possibly form a coefficient referring to a certain part of each of 
the 84 skulls, there is nothing illegitimate in the modifying of the 
84 statistical units in the case of the time series correlation 
considered above. Referring to Appendix B of the 1st ed. of this book 
for further comment, we note by way of a general conclusion that 
in a theoretical autocorrelation model the size of the statistical 
mass must also be taken into consideration. 

Summing up, the intricacy of the problems of significance, and 
the dependence of the correlation coefficients on the statistical mass, 
constitute two fundamental difficulties in quantitative autocorrela- 
tion analysis. The situation seems to justify the opinion that the 
utmost caution is necessary when drawing quantitative conclusions 
from observational time series — a hypothesis should not be 
considered safe unless corroborated by empirical series obtained 
from different and, if possible, independent statistical masses, and 
supported by aprioristic considerations independent of the statistical 
evidence* 

The applications to observational data presented in Chapter IV 
are far from aiming at quantitative results, the purpose being 
more to exemplify the qualitative differences between the scheme 
of hidden periodicities and the schemes of linear regression. Since 
we attach no importance to the quantitative outcomes, the signi- 
ficance problems will not be entered upon in the present study. 


25. On some special cases of linear autoregression. 

Keeping the assumptions of the previous section, we shall in this 
section consider the special cases obtained when putting h = 1 and 
h 2 in the general definition (219) of linear autoregression. The 
resulting formulae will be readily surveyable, and give rise to a 
few remarks of general scope. In a few instances the model series 
given m section 15 will be used for the illustrations. 
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Attaching the analysis to the formula (219), we take h = 2, and 
denote the roots of the characteristic equation (191) by p and g. 
Thus we have 

(230) a x = — (p + g), a 2 =p • g, a n = 0 for n > 2 ; | p \ < 1, | g | < 1 ; 

As the coefficients at must be real, we notice that either I. p and q 
are real, or II. p = A + iB, q — A — iP, where A and J5 represent 
real numbers such that 

(231) A* + B* = \p\* = \q\*<l. 

We shall assume that p =t= g, gaining thereby the general solution 
of the difference equation (32) to be (cf. also p. 146) 

(232) Pi-Jj' + P.-fl*, 

where P x and P 2 are arbitrary. In case II, this expression may 
be written (cf. (33)) 

Qi ■ C l cos t + Q 2 - C l sin t. 
where and Q 2 are arbitrary, and 

C= +]/A* + B*; cos X t = A/C, 0<X l <n. 

Cases I and II will be dealt with separately. 

I. p and q are real. 

Inserting the general expression (232) for and b 2 in the system 
(97), and solving for the constants P x and P 2 , we obtain readily 

(233) b k = —2— • p k H q k = (p k+1 — g* +1 )/ (p — g), £ ^ 0. 

p — q Q—P 

Next, insertion of this expression in (203) yields 

TT2 = ^ 1 + M _ 

Z) 2 (^ (1 — p 2 )(l — g 8 ) (1 —pqY 

The system (222) reduces to the single equation 

(234) r x -f + a 2 • = 0. 
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Solving for r u observing that r 0 — 1 (cf. formula (224)), equalling 
these two cofficients to the general expression (232) for r 0 and r lt 
and solving the linear system thus obtained for the constants 
and P 2 , we get 


(235) 


n = 


pi 1 — g 8 ) k . q{\ — 

{p-q) (1 +pq)' P + (q-p)JT+pq)' q 


It is readily verified that (205) is satisfied. Using (230) and (235) in 
the formula (229), we find for the derivative of the generating 
function in the interval (0, n) 


(236) TF'U) = 


(1 — f) (1 - g 8 ) (1 —pg) 


(1 + pq) (1 +y - 2p cos A) (1 + 2 s - 2 q cos A) 

_ 1 


Z 2 • (1 + p* — 2 p cos A) (1 + — 2 £ cos A) 

and a short calculation will verify (121). 

Theorem 1 1 gives for the expectation of an arbitrary ordinate 
m the Schuster periodogram 


El&to, A)] = 


4 D 2 (jj) 


«(l +y — 2pcosA)(l + q t ~2q~^H) + 


The following special cases are instructive. 

) 2 — 0. In this case the relations reduce to 

C(t)-pC(t- 1) = ?? «), 

(2S8) 

-P a (q) 

x u- jo - zjj cos A D*©-(1 

(239) 


IF' (A) = — — — i— ZiL - 

1 + J. -2j, cos a S^RT^r 2 p 00. Jy 

■E[C*(n, A)]^— — 

«•(! + p* — 2p cos A) ‘ 


These formulae, which cover the casefe= i l . 

by Sir G. Walker ((1931), p. 52l ) * * W been glven earlier 

fonnuk WaLKER 8:17,53 also th « variance formula (220) for an arbitrary A ( see his 
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2) q= —p. We obtain without difficulty 15 


2 ) = *?(*), 

hk = 1*2 k=p 2k y £>2*4-1 = rajt+i = 0 , Tc ^ 0 

D 2 © = J) 2 (^)/a -/), 



Fig. 2. Generating function derivatives obtained from formulae (236) and (239). 


(240) W' (X) ■■ 


1 -P 4 


(1 + ^9 2 ) 2 — 4 jp 2 ■ cos 2 V 


E[G 2 (fi, 1)] = - 


4 D 2 (rj) 


n • [(1 “4p 2 * cos 2 X\ 


+ 0(1 In). 


The graph above shows the curves W' (A) which belong to the 
processes defined by 

a) formula (103), with p= — ’8, q = 0 (thin line), 

b) » (104), with p = '8, (jf = 0 (thick line), 

c ) P = ’8, g = — *6 (broken line). 

The graph contains, for comparison, the line W' (X) = 1 which 
represents the derivative of the generating function in the case of 
a purely random process (cf. the remark attached to formula (131 )). 
8 - 535697. H. Wold. 
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II. p and q are conjugate complex, 

(241) p = A + iB, q = A — iB. 

Since the above developments are perfectly general, we have now 
only to insert (241) in the formulae for bk , r*, etc. given under I. 
After elementary transformations we get for Tc ^ 0 

(242) bk = C k cos kX t + ^ • C k sin k X u 


(243) n = C k coskX l + |- * + ^ C k sin k X u 


(244) 

D*(£) 


1 + C S 

■DHrj). 

(1 - C 2 ) (1 

+ C i — 2A S + 2B k ) 

Writing 





JO.) = 4 

^pa + c 8 ) I* b* 

* L 2 C 2 -coslJ +— S (1 

- C 2 ) 2 , 

we get 





(245) 


W'(X) 

D 2 (ij) 




n*(Q ■ j{X) 


and 







E[C*{n,X)] = 

_ 4 D 2 ( 97 ) 
n -J a) + (l/w) ' 



Possessing now a collection of formulae sufficient for our purposes, 
a partial check is obtained by observing that when putting A — 0 
in the above expressions, we get the same result as when replacing 
p by i • p in the formulae I, 2. 

Proceeding to a first application of the above developments, we 
observe that the formulae under II cover the case of an oscillatory 
mechanism whose intrinsic oscillations consist of a single damped 
harmonic with frequency A, lying in the interval 0 < X t < n, and a 
amping factor <J. Having earlier found that a periodogram 
analysis is ineffective in the case of linear autoregression, we are 
now m a position to prove the statement advanced in section 21 
concerning the dangers of the periodogram method (see p. 95). 
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Speaking generally, the question is whether the expectation of 
the periodogram ordinate C 2 (w, A) presents a maximum when A=A*, 
i. e. when the abscissa of the periodogram equals a frequency 
characteristic to the intrinsic oscillations of the mechanism. Were 
the answer in the affirmative, we should — at least in principle — 
be able to use a periodogram construction for the discovery of the 
intrinsic frequencies A*. In fact, considering a sample series 
(£i_ ; i, £*_ 2 , . . .), making a sequence of periodograms on the basis of 
sections of type (£*- 1 , . . «), (&-«— 1 , • . £<— 2 n), . . and forming 

for every abscissa A the 
average of the correspond- 
ing sequence of periodo- 
gram ordinates, the re- 
sulting curve would pre- 
sent maxima in the points 
A = Ai sought for. How- 
ever, in order to prove 
that this way is barred 
— in point of principle, 
thus even if the statistical 
material were extensive 
enough for the construc- 
tion of the required set 
of periodograms — we 
need only consider the 
case of a single intrinsic 
oscillation covered by the 



Fig. 3. Unit circle showing where W' (A) as given 
by (245) presents one or no maximum for 0 < A < n 
(non-dotied and dotted domains respectively). 


formulae (II) above. The expectation E [G r2 (v>, A)] being asymptotically 
proportional to the derivative W' (A), it will be sufficient to investigate 
the extremes of the latter function. 

Evidently, W' ( A) as given by (245) is maximized by those A-values 
which minimize the auxiliary function J (A), and vice versa. The A- 
values in question are seen to be 


(246) 


A = arc cos 


A (1 + C 2 ) 
2 C* ’ 


0, A- 


The behaviour of W f (A) being different in the cases | A | ^ 2 6 y2 /(l + G“), 
the accompanying figure shows the part of the curve | A | = 
2 C^/d + G 2 ) lying in the unit circle G=l. An analysis of the 
second derivative of J (A) gives the following results. 
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a) \A \ < 2C*/(l + C 2 ). Referring to the figure, the roots A +iB 
are in this case lying in the non-dotted part of the unit circle. 

W' (A) presents one maximum in the interval (0, tt), viz. in the 
point X defined by 

(247) A = arc cos A{1 + C 2 )/2C 3 , 

and two minima, viz. in the points X = 0, and X = n. 



Fig, 4. Generating function derivative obtained from formula (2-i.O). 


Illustration. Considering, for example, the process of type IF obtained by 
taking A = ‘ 8, £ = * 4, wegetC 2 =-8, A < 2 C 2 /(l + C 2 ). ‘ In full agreement 
herewith, the point (A, J5) = ( 8, 4), which is plotted in fig. 3, is lying in the 
non-dotted part of the unit circle. It follows from the above that the corresponding 
function TF (1) presents but one maximum in the interval (0 < X < 7t\ This 
function W (1) is shown in the figure above. 

b) \A \ > 2(77(1 + C 2 ). The roots are lying in the dotted part 
of the unit circle. 

W (A) presents one maximum and one minimum, the former being 
attained for X 0 if A > 0 and for X — 7t if A < 0, the latter for 
^ ^ A > 0 and for X = tc if A < 0. The curves W' (x) are here 

similar to those obtained in the case I, 1 (cf. fig. 2, unbroken lines). 

Roughly speaking, if the roots A +. iB of the characteristic 
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equation of the oscillatory mechanism are lying* close to the 
periphery of the unit circle, the maximizing A-value (247) is seen 
to approximate the intrinsic frequency X l — arc cos A/ C. In other 
words, if the intrinsic oscillations are only slightly damped, the 
periodogram analysis suggested above will be able to discover the 
frequency of the intrinsic oscillation. 

On the other hand, holding A/B fixed, and letting C leave the 
immediate neighbourhood of the periphery of the unit circle, the 
^-values resulting from (247) will deviate more and more from 
A x = arc cos A/C . In case A > 0, the value given by (247) is seen 
to be less than X u while the reverse holds true if A < 0. Thus, 
the periodogram will show a tendency to over-estimate the intrinsic 
period if this is above 4 time units, and to under-estimate it if the 
period is lying between 2 and 4 units. As to periods below 2 
time units, these require a finer equidistance (cf. section 4). — We 
conclude next that the bias will be the larger, the more heavily 
the intrinsic oscillation is damped, i. e. the smaller the damping 
factor C is. Further, excepting the cases of an intrinsic period 
equalling exactly 2 or 4 time units, and letting the damping factor 
pass below a well-defined limit, the periodogram will altogether 
cease from giving information about the intrinsic period. 

The disturbing effect pointed out may be looked upon as caused 
by the external factors influencing the oscillatory mechanism. As 
by hypothesis these factors contain a random element, I propose 
the term chance effect for the bias in question. 

The situation may be described by saying that the inference 
drawn from the characteristic equation of the intrinsic oscillations 
does not apply directly to the oscillations of the mechanism when 
influenced by random external factors (cf. p. 95). While the chance 
effect is easily surveyed in the case of one intrinsic damped har- 
monic, the state of things is far more complicated when the mechan- 
ism presents a tendency to composite oscillations. Having stated 
this, our analysis of the chance effect will be brought to an end 
by a few explicit illustrations. 

Illustrations. We shall first consider the case of one intrinsic oscillation, with 
damping factor 8 = *894, and frequency X l = arc tg Vs = 26°, 56. This case 

is elucidated by the diagrams 3 and 4. Inserting A = *8 and (7 = ]/"* 8 in (246), the 
A-value maximizing W\X) is found to be X = arc cos ‘9 ==25°, 84. The periodogram 
thus tends to deliver a period equalling 360/25 '84 = 13 *93 time units, while the 
intrinsic period is 13 * 55 time units. In agreement with the general results concern- 
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ing the case of a non-composite intrinsic oscillation, the period is over-estimated. 
Although the damping factor is fairly large, just below unity, the bias is rather 
important. 

Proceeding to some examples of periodogram ordinates derived from the model 
series given in section 15, let us first consider the process {<y (i> (£)} as defined by 
(103). Continuing in the notations of the present section, we obtain by short 
calculations h = 1, ^=—‘8, jD 2 (^)==*6, D 2 © = 1*667. Speaking in the 
language of oscillatory mechanisms, we are concerned with a single intrinsic har- 
monic, the frequency, period, and damping factor of which are 7t, 2, and *8 respect- 
ively. An ordinary periodogram analysis gives negative results, the expectance being 
inversely proportional to the length of the series analyzed. However, we know 
from the general theory that the expectation of the periodogram ordinate in X = tc 
is larger than the expectation in a purely random series. Taking w = 20, and 
keeping in mind that we are dealing with the exceptional case X = rc y the latter is 
found to equal 1' 667 /w = *083. The former, as given by the exact formula (127) is 
E[C 2 { 20, 7t)] = *59. The approximate value obtained from (239) equals *75. Now, 
the 1000 elements in the model series ($ n ) have been arranged in 50 sections, each 
containing 20 consecutive elements, and the periodogram ordinate C 2 (20, 7t) has 
been computed for each section from formula (27). The average of the 50 ordinates 
thus obtained equals '56, a value not far from the the expectation *59. 

^Considering next the process {<J (2) U)} given by (104) we have h = 1, p = *8, 
D 2 {if) = - 2, -D 2 (£) == 556. Referring again to Fig. 2, it is seen that for small 
frequencies the expectation is larger than in the case of a purely random series, but 
as before of the same order of magnitude in respect of n. Considering, e. g., the 
Fourier coefficients (25) for k= 1 in a sample series section of 20 elements, we have 
n an *j X~ 2 tz In — 18 . Formulae (127) and (239) give respectively 

E[C (20, 18 )] = 36 and i£[C 2 (20, 18 )]~*34, while the corresponding value in the 
purely random case is 11. On the other hand, operating on the model series ($ s> ) in 
the same way as before on ($ n ), we have arrived at an average periodogram ordinate 
E[C (20, 18 )] equalling 51. The rather substantial deviation from the expectation 
suggests that the periodogram ordinates are subject to a large dispersion. Actually, 
this suggestion seems to indicate how the matter stands. At any rate, the distribu- 
tions of periodogram ordinates I have constructed on the basis of model time series 
have all presented a pronounced skewness, and a very large dispersion — often two 
or three of the largest sample values C 2 (n,X) constitute alone as much as 20 to 30 
per cent of the sum of the 50 sample values in the material. An instructive example 
of this is given below. 

Recurring to the series (# l) ), formula (127) yields E[C 2 (20, 18°)] =■- *046 while 
(239) gives E(C 2 (20, 18°)]~‘038. _ A periodogram analysis as described in the 
previous illustration has given E[C*(2Q, 18°)] = ‘058. The figure below shows the 
distribution of the 50 averaged sample values of the periodogram ordinate dealt 
with. The figure contains the curve of summed relative frequencies ( F , thick lines), 
and a histogram showing the frequencies in the classes 0 — ‘01, *01 — ‘02, etc (/, 
broken lines). The frequency curve graduating the histogram has been drawn by 
hand (broken curve). The distribution is seen to be very skew, and the histogram 
suggests no pronounced maximum for the frequency curve. 

The above discussion is related to certain results obtained by E. Slutsky (1934) 
about sampling problems in periodogram analysis. It has long been recognized, of 
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course, that great caution is necessary when judging the reliability of a periodogram 
ordinate, especially if we have no prior knowledge about possible periods. There is 
further the difficulty that the largest periodogram ordinate has a larger expectation 
than a fixed ordinate, or a randomly chosen ordinate. A classic result by K. A. Fisher 
(1929) gives the distribution of the largest ordinate in the case of a purely random 
normal process. 



Fig. 5. Distribution of periodogram ordinates derived from the model series ( see 

table 3 ). 


Next, we shall give a few applications of the inversion formula 
(200). Working on the model series given in section 15 <J, we shall 
exemplify the construction of the primary series from a given series 
of autoregression. 

Illustrations. Considering the model series (<J/ l) ) as defined by (103), we have 
a£ 2) = $ n + *8()1ili for £ = 1 , 2, ... Inserting successively the (5-values given in 
table 3, we get di 2> = 1 ’ 00, <x\ ^ = ’20 + 8 X 1 00= 1 00, = 16 + 8 X 20 

= *00, etc., in full agreement with table 1. Thus, knowing the autoregression 
coefficient a x = ‘8, we can reconstruct the primary series of the autoregression 
series examined. In the same way, we can reconstruct without difficulty the primary 
series of the model series (dl 2) ) and (<K S> ). 

In view of the applications, an important problem is to find the coefficients ( a ) 
belonging to a given series of autoregression. This problem will be discussed in 
detail in section 32. For the moment we shall only remark that the relations (222) 
together with the last relation in (221) form a system of linear equations which will 
permit us to derive the coefficients a x , . . ., ah in terms of the autocorrelation 
coefficients r v . ., r h . Thus, identifying the coefficients r k with the serial coeffi- 
cients rjc of the series examined, the linear system will give a set of approximate 
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coefficients a v . an . We can then derive as before the primary series which cor- 
responds to the coefficients a v . ah arrived at. 

We give below the first five serial coefficients of the model series (8), and the 
corresponding autocorrelation coefficients as derived from formulae (2S8) and (243). 

Table 5. Serial and autocorrelation coefficients of the model series ($). 


k = 

1 

2 

3 

4 

5 


-*786 

*647 

-*498 

*375 

-•291 

r k = 

-*800 

*640 

-•512 

•410 

-*328 

*k = 

'862 

*700 

*580 

*482 

‘398 

r k = 

•800 

•640 

*512 

*410 

•328 


•127 

-628 

-*194 

*397 

*186 

r k = 

*121 

-*626 

-•204 

*366 

*206 


It is interesting to notice that although the model series consist of as many as 
1000 elements each, the deviations between empirical and hypothetical correlation 
coefficients are rather substantial (cf. also p. 50 and p. 109). 

Our analysis of the process of linear autoregression will be 
concluded by revealing a connexion with the » sinusoidal limit 
theorems » of E. Slutsky and V. Romanovsky touched upon in 
section 16. 

Let 

L(x) = x 2 — 2Ax+ 1 — 0 , — 1 < 4 < 1 , 

stand for the characteristic equation of a simple harmonic 
Pi cos 1 -f P 2 sin X i t , 

and let {£ (1) W}, {£ (2) (0}, ... represent a sequence of processes of 
linear autoregression defined by 

£ (p) (8 - 2 A p • C :p) (f - 1) + CJ • (t - 2) = rj iP) (t\ 

Let it further be assumed that the processes have equal dispersion, 

D& p Ht)) = o, 

lim J 4 p = J 4, lira Cp = 1. 

*-* p— oo 


and that 
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delating* certain points of the autoregression analysis as presented 
in sections 19 and 20. 

By definition, the general formula for a process {£®} of moving 
averages reads 

(249) {£(*)} = {!?(«} + fa faCf-1)} + ••• + fa {yit-h)}, 

where {)?(£)} is purely random, and the sequence (b) = (fa, . fa) is 
real. As before, we shall assume that D{rj) is finite, and that 
E[rj\ = 0. Thus (249) forms a special case of the variable defined 
by (199), and it follows further that the formal developments remain 
valid if { 9 ?®} is non-autocorrelated. 

In the process of linear autoregression, the autocorrelation coef- 
ficients and forecasting values were found to follow certain damped 
harmonics. In the present case only the first h elements in the 
sequences mentioned are different from zero. In fact, (204) and 
(205) yield for a process {£®} given by (249) 


D 2 © = ( 1 + K + « + ■■•+ M) ■ B\rj\ 


(250) 


r*© = 


0>k + 1 + ■ • • + bh * bh—k)/ (1 + 1\ 4- * • • + b\) for Jc<h } 

0 for Jc > h, 


where k> 0 and 
in his 1933 Course. 


6 0 =1 


Fdtit+Jc)]-- 


i b k • 

lo 


formulae given by Professor EL Cramer 
Next, (213) gives 

Vt + bk+i • rjt— 1 +••• + &*• t]t — h+k for 0 <&</&, 

for h> h. 


where the forecast is based upon the condition 

(C) = (92 (0 = 97 *, 7 ] (t 1 ) = TJt—l, • • •} 9 — h + 1 ) = rjt—h+l)' 


Inserting (250) in the general formula (11 6j for the generating 
function W C x ) of the autocorrelation coefficients in a stationary 
process, we conclude that W(x) — x reduces to a finite trigono- 
metrical sum. Thus, in the present case W' (x) exists, and is like 
W (x) uniformly bounded. Consequently, replacing » linear autore- 
gression by » moving averages* in theorem 11, we get a corres- 
ponding theorem covering the present case. We conclude, i. a., 
that periodogram analysis is an inadequate method of research in 
the case of moving averages also — the expectation of an arbitrary 
periodogram ordinate is inversely proportional to the length of the 
series under analysis. — Formula (256) gives W\x) explicitly. 
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In view of the applications, the following questions call for in- 
vestigation (cf. the illustrations on p. 119). The autocorrelation 
coefficients Vk of a process of moving averages being given, is it 
possible to derive the primary process and the coefficients ( b ) of 
the moving average? In particular, does there exist a relation of 
type (200) yielding the primary process? If the answer to the 
second question is in the affirmative, how can the coefficients ( a ) 
be obtained? The first of these problems was formulated by Professor 
H. Cramer in his 1933 Course. 

It is suitable to start the analysis with the second question. 
Denoting the process considered by {£(£)}, we shall, until further 
notice, assume that {£(*)} is given by (249). 

In the first place we observe that theorem 9 covers the case 
when all the roots of the equation 

(251) x h + x h '~ l + • • • + lh— i x 4* b h = 0 

are of modulus less than unity. The theorem states that if this 
condition is satisfied, an infinite sequence (a) = (a u a 2 , . . .) such that 
(200) holds is given by the system (97) and the difference relations 
obtained for at when replacing the at s by the fc/s in (96). These 
relations constitute a difference equation of order h satisfied by the 
sequence (a), and it should further be observed that the sequence 
(a) arrived at forms a composed damped harmonic. 

We have found that the condition attached to the roots of (251) 
will secure the representation (200). Under these circumstances the 
relations (208) and (210) derived from (206) and (207) must hold 
true. Explicitly, these read in the present case as follows: 

(252) aih+kfh + chh+k — i I'h — i H + ah+k+i + a h+k + 

+ Clh + k—l V\ + ' ' ■ + Clk+ 1 Vh—l + 0,k Vh — 0, 


fl‘2 h 1'h + 02 h—1 Th — 1 + Oh + l1\ + Oh + 

+ CLh— 1 "t ’ ‘ 1 + O i Vh—l + Vh = 0 


( 253 ) 


Chh~i r h + 02/1—2 n—1 + ■■• + «* r { + ah— i + 

+ Oh— 2 -f Th— 2 + Vh- 1 = 0 


#/i+i Vh + on Vh—i + ■ ■ • + 0 2 + a x + i\ — 0 
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ah Th + an — i Th—i H 4 a x r x + 1 = 1 /K 2 

a h - 1 r h + • • • + a 1 r % + ?\ = V K 2 % 



' ' a 2 Th + a t 7‘h—l + Th — 2 = bh— 2 / 
r* + r*_i = bh- 1 /K 2 

jh = bjK 2 . .. T 

Comparing the above coefficients a* and the coefficients a* n = — « (&>w) 
lded by an autoregression analysis of {£(£)} as described in section 

, and letting s > 0 be arbitrarily given, it follows that there exists v 

n such that / 

+ e)D*(Q ^ D*(C(e + a 1 £(t- 1) + ■•• + Ontit-n)) 

^ D 2 (C(0 + «ln C (t- 1) + * * * + £ d -12)) ^ X 2 * D 2 ©, 

5 second inequality resulting from the definition of the eoeffi- 
nts ain , and the third being similar to (182). Inserting (249), 
elementary transformation yields 

- (cq + bj) 2 + (a 2 + a l + 6 2 ) 2 + * • + 

+ {a n + a n -~ 1 &! + *•*{- a n —h bh) 2 -1- • * + (a» &a) 2 ^ 

(«i n + iq) 2 + (#2 n 4- Am + b 2 ) 2 + ■ * + 

+ (dnn + cin— l,n b x + * • + a n —h,n bh) 2 + • * + (a n n bh ) 2 “ 0. 

ncluding that | ai n — (ii \ < Ci • V £, where c t is independent of n , 
follows that ain— * at as n— >oo. Moreover, forming the difference 
the residuals based on (a M . a„) and (am,.., a nn ), a simple 

plication of Schwarz’ inequality shows that the non-autocorrelated 
>cess yielded by the autoregression analysis is identical with the 
mary process. 

Considering the singular case when (251) presents at least one root on the bound- 
of the unit circle, we conclude from theorem 8 that the series S a\ will be 
ergent, and that no representation of type (200) will exist. By elementary trans- 
nations we shall show that this obstacle to calculating the primary process may 
removed by means of a limit passage. 

Let (249) represent a moving average, x n the roots of the characteristic equation 
l), and let | x n | ^ 1. Introducing auxiliary averages {£ (/) (£)} defined by 

§ ( ''>(f) = J?(f) + £</> •*?((- 1) + ■■• + i= 1,2,..., 
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let the symbols referring to (§ (t) ( t )} be marked by d). Let = x n when \x n \ < 1, 
and let x n — xj^ ~ ^/ £t | x ^ | = 1 — s £ when | x n | = 1, where 0 < = | 1 — > 0 

as i —> oo . 

By construction, the inversion formula (200) applies to {£ (?) (6}> 

n ( i ) = gfi (i i ) + o« • it - 1) 4- a® . g(0 a - 2) + • ■ • . 

The limit relation desired is based on the coefficients a ^ and reads 
(255) {77(f)} = lim [{C(»} +0? 1 ■{?»-«} +aSP -{?(#-«} + •■■]. 

1 — ■ 00 

Writing { 77 (f, ?0} for the process under the limes sign, it is sufficient to prove that 
Di = D 2 [7j ( t y i) — 77 it)) is of the same order of magnitude as a (see p. 40). 

OB 

Paying regard to (198), we get D\ = 2 (L [a^]) 2 , where 

t=i 

L [x (#)] = x it) + by' x it — 1) + • • • + bh — 1 * x it — h - h 1) + bh’ x it — h). 

Write next afi on the fonn S = S (#) * (a?^) f , where H <fc> is a polynomial of 

n 

order Ar, and An the multiplicity of the root x n (cf. (33)). Inserting ofi in D\ , the 
terms in 2 of type where x® = x n , will cancel out because they satisfy the 

relations L [f* • (x^)'] = 0. It remains to estimate the terms involving t k * (sr^)*, 
where = x n — According to the inequality of Schwarz it is evidently 

00 

sufficient to verify that, for any of the A-values in question, S= 2 ( L[t k 'ix n 

*=i 

— Jit])* — * 0 as i — > co . 

To prove this, remove the factors (x n — from L [<* • ix n ^if], and 

develop the remaining tenns {x n — d if according to the binomial theorem. Then 
we get 

S = 1 (x n — J,) 2t - ih • [Pit, k) + I\ it, k)- Ji + ■ ■■ + Pk(t, k) ■ 4\}\ 

t l 

where 

P [t, k) = x~ t+h ■ L [t k • <] = •«* + h ■ xjp 1 ■ it - D* + • • • + 6ft • (t-hf. 

Evidently Pit,k) = 0 for all t. Disregarding constant factors, we have further 
p t (f t lc) = h ■ x'l • t k + (A — 1) 6, • x k ~ 1 it — l) fc + • • + bh—1 • Xn ■ it — h 4- l) fc > 
P a it, lc) = h (A — 1) • x* • t k + (A — 1)(A — 2) b, ■ x h ~ l • it — if + • * + 2i A - 2 • 
■ x,® • (/ — A + 2) fc , etc. 

Paying regard to the identities P it, k) = P(f, k 1) = • • = Pit, 0) = 0, we 
ffet. without difficulty Pit + 1, k + 1) — Pit, k + 1) = 0 for all t. Hence Pit, 4 4*1) 
must reduce to a constant, say r 0 . In the same way we get Pit + 1, k + 2) — 

— j c 2) = • P(f, A + 1), which shows that Pit, k + 2) is linear in respect 

to t, and by induction we find that Pit , k + s) is a polynomial in f of order s 1. 
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Elementary transformations show that c 0 = P(t, k + 1) — ( t — h) • P(t , k ) = 
= Pj (t, k) for all £. Similarly, Pit, k -f 2) — (t — h 4- 1) • P x it, k) = P 2 (£, A:) is 
linear in respect of t, and repeating the procedure we find that P s it, k) is of order 
s — 1 . 

We conclude that S is linearly composed of a finite number of terms of type 

8k = * 2 f* • (x n — where k^ 0. Hence | ^ I — * 2 if* • (1 — Si) 2t ^ 

*=1 *=1 
^ .4 ■ £* + V[l — (1 — £*) 2 ? +1 , where A is independent of t. It is seen that for 

any finite k ^ 0 the term \Sk\ is of an order ^ 6i, which completes the proof. 

Tlie above analysis has shown that if (251) has no root Xk falling* 
outside the unit circle, certain well-defined linear operations on the 
moving average given by (249) will yield the primary process {17 (ft}. 
If \xk\^l for all &, the sequence (6) and the process {£(ft} will be 
termed regular . 

Turning next to the generating function Wix) defined by (121), 
let (249) be an arbitrary process of moving averages, and let its 
autocorrelation coefficients be represented by r*. Paying regard to 
(250), we obtain the fundamental identity 

(256) ^ 2 (a? ft + 6 1 a^“ 1 + + b h —iX + bdibhQC h + bh— ia^ —1 + +& 1 as+l) a = 8 

= r h x 2h + + ••■ + r ia^ +1 + cd 1 + r x ^ 1 + ■ • * + r h -ix 4- r h . 

Replacing x by e ix , formula (121) shows that we get e ihx • W'ix). 

Until further notice, we shall again assume that (249) is regular. 
Denoting as before the zeros of the first factor by a?*, it is seen 
that the zeros of the second factor are given by 1/xk. Consequently, 
the zeros of the right member may be denoted x u .. ., Xu- 1, scaA, 
where 

Xk = Vx 2 h+l-k< 0 < \ x ± \ < \ x 2 \ <•••< \x h \ <1< | Xh+l | <-• < | X 2 h\. 

Further, it follows that if there exists another sequence, say 
(1, &$*■, . such that the corresponding moving average will have 
autocorrelation coefficients coinciding with those of (249), then one 
zero of the polynomial & + b^xf 1 - 1 + ■■• + b^x 4- bf will equal 
either or 1/x^ another either x 2 or l/x 2i etc. Evidently, there 
are at most 2 h real sequences of this type, say (&£>), . . (JW). Letting 

(6^) represent the regular sequence started from, a short reflection 
reveals that all the other sequences in the group are non-regular. 

If in this way a group (6^) of sequences is attached to every 
regular sequence (6^), it is clear that an arbitrary sequence (b*) = 
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= (1, 6*, . ., bt) will belong to one, and only one, of these groups. 
It should further be observed that this group may be constructed 
with the use of only the sequence (6*). Similarly, a group (b$) will 
evidently be uniquely determined by the corresponding sequence of 
autocorrelation coefficients. 

Thus prepared, let {r](t)} be a purely random process, and let 
(b^) be a group of finite sequences as defined above. Writing 

(257) [KM]* = 1 + [6(/>] 2 + [&(/>]* + • • ■ + [&j*)] 2 , 

let, correspondingly, a group = ({£ (,:) (£; rj)}) of moving averages 
be defined by 

7T(°) 

(258) £ (0 (f ; rj) = ~ [rj (© + V (t- 1) + • • • + rj (t - h)\. 

Marking the symbols referring to different processes in a group by 
corresponding indices, it follows from the construction of the group 
(£W) that 

(259) D (£ (f) ) = D (£W); rf = £ = 0, ± 1, ± 2, . . .. 

Further, the group will contain one, and only one regular process, 
viz. {£ (0) (0}. This will be assumed to be given by (249), and be 
alternatively denoted by {£(£)}. 

If all roots of (251) are falling on the periphery of the unit 
circle, the group (£ w ) will evidently contain only the process {£ (0) (©} = 
== {£(£)}. Otherwise the group will include more than one process, 
at most 2 h in number. Furthermore, it should be observed that an 
equivalent construction of the group (£^) is possible on the basis 
of the primary process {??(£)} and the characteristics (259) common 
to the processes in the group. 

Referring to the autoregression analysis as set forth in section 
19, it is seen that the coefficients in the formulae for the residuals 
rj(t ; n) involve only the autocorrelation coefficients and the disper- 
sion of the process under analysis. Let this fact be combined with 
the above observation that the limit residual lim rj(t ; n) of a regular 

n— 

process of moving averages may be obtained directly, viz. either 
from (200) or from (255). We conclude that the autoregression re- 
siduals, say {»/■'> (0}, of the non-regular processes in a group ({£ (<) (^ r/)}) 
will be given by corresponding linear operations, and that these 
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expressions will involve exactly the same coefficients (a) as in the 
case of the regular average. 

Considering in the first place the case when all the roots of the 
equation (251) obtained from the regular process (249) in the group 
are falling in the interior of the unit circle, the above argument 
yields 

(260) • rjto (f}=rj(f) + (a x + Up) • rj (f — 1) + (a 2 + a i &i l) + ' > ) m rj(t—2) H h 

+ (an + an— i b P + • ■ + bjp) • tj (t — h) + 
+ (ah+i + ah b$ + • * + 6 ^) * ^ — !) + •■•. 


Since 4= (6), the residual rj® ( t ) cannot reduce to rj ( t ). Further, 

were a finite moving average, it would follow that r*(ijM)+0 

for some Tc > 0. As is non-autocorrelated this is impossible, so 

f]® as given by (260) must be an infinite moving average (cf. sec- 

tion 15 y)- Now, paying regard to the relations (250) and (252 — 253), 
an elementary transformation will verify that n* (rj^) = 0 for &=t=0. 

On the other hand, if at least one root of (251) is lying on the 
periphery of the unit circle, the representation (255) gives 

{i/ f) (t)} = lim [{£«(«} + {£0(t- 1)} + «p) {£«0- 2)} + ■•■]. 

A — *oo 

By construction, the coefficients BM connected with (255) are such 
that lim = b^ = b n . Keeping this in mind, we conclude from 

A— oo 

(96) and (97) that lim = a n - Now, the relation (255) implies that 

A— oo 

we may express the £s in terms of the ij* s, and that the resulting 
sum, say S {iy0 — &)} - cjjp, has a limit equalling the sum of the limits 
{r)(t — £)} • lim cfp. Since and {£} have identical autocorrela- 

tion properties, we may in the above relation perform the same 
procedure on {£0 } . it follows that the representation (260) holds 
even in this case, without a limit passage being required. 

Having now illustrated theorem 6 by means of a process of mov- 
ing averages, the representation secured by theorem 7 is readily 
obtained directly. In fact, since the coefficients (b) in the canonical 
formula in theorem 7 are derived solely from the, dispersion and 
the autocorrelation coefficients of the process considered, we con- 
clude that they must be identical for all processes in a group (£^) 
as defined above. Thus we have (cf. also (258)) 
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(26 1) {£M (0} = {ijM ©} + b, {qW (f - 1)} + • • • + b h {*?» (t - h)} 

for all processes in the group (£ ( *U It should be noticed that (254) 
gives the coefficients (6) in terms of the sequences (a) and (r) char- 
acteristic to the group. 

Summing up the main results, the analysis gives the following 
answers to the questions set forth on p. 123. The dispersion and 
the autocorrelation coefficients of a process of moving averages 
being given, there will in general exist a well-defined group of 
moving averages with the same characteristics, and with the same 
primary process. These moving averages are limited in number, 
and it is possible to construct the corresponding sequences of coef- 
ficients (i b ) by means of the characteristics prescribed. Alternatively, 
if one sequence ( b ) in a group is known, the others are uniquely 
determined. — Among the moving averages in a group it is pos- 
sible to distinguish one, the regular process, which alone has the 
property that the primary process will be given either by a rela- 
tion (200) or by a limit relation of the same type. The coefficients 
(a) in these representations are uniquely determined by the coeffi- 
cients (b) of the regular process. — For the non-regular processes 
in a group, there exists no relation of type (200) yielding the prim- 
ary process. However, inserting a non-regular process {£^(0} in the 
representation of the primary process in terms of the regular pro- 
cess belonging to the same group, we get the non-autocorrelated 
residual of {£^(0} secured by theorem 6. According to theorem 7, 
the process {£W(0} may be looked upon as a moving average of its 
residual. The coefficients of this average are nothing else than 
the coefficients (6) of the regular process in the group around 

In connexion with the applications in section 31, we shall derive 
a linear relation of more general type than (200), which yields the 
primary process in terms of a non-regular moving average. 

The autoregression analysis has given us no tool for distinguish- 
ing between the different moving averages in a group. If more 
precise information is required, other methods have to be applied, 
e. g. an analysis of conditioned variables (cf. p. 164). Such lines of 
research falling outside of the scope of the present study, this section 
will be terminated by some explicit examples of the previous analysis. 

Illustrations. 1) Let the autocorrelation coefficients of a moving average be 
given by r, =■ — *6, r n = 0 for n> 1. 


9 - 535697 . H. Wold. 
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The relation (256) reads in this case 

*5 (as — 1) ( — ac+ 1)= — ‘5as 2 + sc — ‘5. 

We conclude that the group ( b W) contains only one sequence, viz. (1, — 1). Thus, 
the general formula of type (249) for a moving average with the given autocorrela- 
tion coefficients reads 

(262) {EMM?® }-to(f-l)}. 

The equation (251) has only one root, oc 1 = 1, and this falls on the unit circle. 
Hence, in order to express r\ in terms of £, we have to apply the limit relation (255). 
Taking, for instance, x^ = l — 10“ fc , we get 

(263) = [{£«)} + (1 — 10-*) - {C(* — 1)} +(1 — 10— *) a • {£(«-»)}+ •••]• 

k — ► » 

Table 2 contains a model series section of a process of type (262). Taking k= 1, 
and applying (263) to the last element but one in this section, it is seen that we get 
the following approximation to the last element in table 1, (2), 

-2+2 C9) — C9) s + C9) 4 — C9) 8 — C9) 6 +('9) 7 + (’9) 9 + • • • • 

A computation of this sum has given — *97, which is fairly close to the exact value, 
i.e. —1. With any prescribed accuracy, it is possible to reconstruct in this way the 
series (a <2) ) on the basis of a sufficiently long series (pi)- 

2) Let the coefficients ( b ) of a moving average (249) be given by (1, 2). 

Forming the relation (256), we get 

‘2 (a?-f-2)(2a?+ 1) = ‘4 a ,2 4-cc+ ‘4, 

and conclude that r 1 =' 4, 7*71=0 for n> 1. The group (b^) is seen to consist, of 
(1, *5) and (1, 2), the former sequence being the regular one. Now, the system (97) 
gives a 1 = — ’5, while the relations corresponding to (96) show that a n —( — *5) n . Con- 
sidering the general formula for the regular process, 

it follows that 

{7 7 «)} = {C(e}--5{^-l)}+C5) s {C(t-2)}-C5) s {C(f-3)} +-■••, 

in full agreement with (200). Observing that K a =l'25, and that (ff (1> ) 2 ==5, for- 
mula (258) gives for the remaining process in the group (£M) 

(264) g (,, (f)} = -5{ljtt)} + (tj(t~ 1)}. 

According to the general analysis, the residual {rj (1) (t)} secured by theorem 6 is ob- 
tained by replacing £ by £ <l) in (200). As is readily verified, the corresponding 
representation of type (260) reads 

r) a Ht) = hrj{t) + 2) + f c> r)(t - 3) 
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We obtain, in analogy with (264), and in harmony with the general formula (261) 

{C (1> (*)} = {y a Kt)\ + *6 {,<"(*- 1)} ='5 { V (t)} + {*?(*- D}- 

It can be easily verified that D 2 (^ (1) (f))=Z> 2 (7 }(fl), and that the process {r) ll \t)} is 
non-autocorrelated, i. e. that rjt(i/ l) )=0 for k 4=0. 

3) Next, let r 1 = J ; rfc^O for /c> 2. 

The relation (256) reads in this case 

?! Cr 2 + ‘5uc — ’5) ( — ‘5.r 2 -f ‘6a* +■ 1) = — J .r 4 + J x z + x 2 +■ J x — J . 

Paying regard to the identity 5x — *5 =(:e — ‘5) (x 4- 1), a short calculation will 
show that the group (6^) consists of two sequences, viz. (1, '5, — *5), which is the 
regular one, and (1, — 1, — 2). The general formula for a corresponding regular 
process reads 

M = {ijtt)} + ’5 {fj(*- 1)} — ’5 {r){t- 2)}. 

We find without difficulty 

The non-regular process is given by 

{C (1> (^)} = '6{»7(0} — 2)} = 

= {,»»«)} + -5 W 1 ’ (<- 1)> - '5 {rj n) (t- 2)}. 

Formula (260) gives for the non-autocorrelated residual 

— fv(t— D— fo(t — 2)— — 3) 


4) A few remarks in connexion with the illustrations concluding Chapter II will 
be sufficient to show how the general formulae given in the present chapter will 
work in the case of a normal process of moving averages. The developments below 
cover both processes of linear autoregression and processes of moving averages. 

The matrix of the infinite quadratic form appearing in the characteristic function 
of a non-autocorrelated normal process {r;| with dispersion 1 is nothing else than 
the unit matrix. Now, keeping in mind the substitution procedure indicated on 
p. 90, the relations (205) will verify the product formula already given on p. 01, 


1 by . . 


x a 0 0 . . 


1 0 0. 


1 r, r, . . 

0 1 6, . . 


0 x a 0 


/«! 1 0 . . 



r, 1 r, . . 

O • 

o 


0 0 x* . . 


1 • • 


r, r, 1 


which yields the matrix belonging to a normal process of finite or infinite, moving 
averages. The previous illustrations exemplify the fact that there are, in general, 
several sequences (b) giving rise to the same matrix in the right member. 
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The general inversion formula (200) implies 



The verification is immediate. In fact, paying regard to (208) and (210) we obtain 
in the first place 


1 r t r g . 


o 

o 

^ rH 


x 2 , x 3 • b l9 x 2 -b 2 , . • 

r i 1 r i ■ ■ 
r 2 r l 1 • • 

* 

a L 1 0 . . 

a 2 1 

= 

0 , x a , X s • bi, . . 
0,0 , X s , . ■ 







Pre multiplying this relation with the first factor of (265), and keeping in mind the 
relations (201), we arrive at (265). 

5) Let pi he the multiplicity of a real root or of a conjugate complex root-pair of 
(251). Then the number of averages in the same group as (249) is i7(pi + l), the 

i 

index i running over those roots and root-pairs which are =4= 1 in modulus. 



CHAPTER IY. 


On the application of some stationary schemes. 

27. Preliminary remarks. Disposition. 

In applied time series analysis, the chief problem is to find a 
hypothetical scheme which from a theoretical viewpoint is appro- 
priate to the phenomenon considered, and gives a satisfactory fit 
to the observational data. Another desideratum is, of course, that 
the hypothesis be as simple as possible. 

As shown in detail in sections 15 and 16, the general stationary 
process embraces all of the hypotheses about time series surveyed 
in Chapter I. The wide scope of the stationary process is due to 
the fact that the restrictions are reduced to a minimum: Besides 
the indispensable postulate that the probability laws must not 
contradict themselves (see (53) — (54)), the only further assumption 
is that time itself will not influence these probability laws (see 
(55)). In other words, time is thought of as a passive medium; 
roughly speaking, this means that any prognosis based upon the 
past development will depend only on this same development — 
i.e. supposing that the same development had taken place with a 
constant lag, the corresponding forecast would not differ, apart 
from the displacement in time. 

It stands to reason that the assumption of stationarity is leg- 
itimate in the most varied fields of scientific research. Then if we 
restrict the analysis to equidistant time points, we have at our 
disposal all of the schemes falling under the discrete stationary 
process. Regarding the simplification of treating time as a discrete 
parameter, it needs no comment that this device is appropriate in 
most if not all practical applications (cf. p. 70 f). 

From the viewpoint of the theory of probabilities, the purely 
random process is the simplest type case of a stationary scheme. 
In sections 15 and 16, certain other type cases were constructed 
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on the basis of the purely random process, and the scheme of 
hidden periodicities was interpreted as a stationary process. The 
special theoretical models mentioned are rather simple, in as much 
as an adequate description of their structure is possible by means 
of linear methods, such as the periodogram analysis and the analy- 
sis of the graph of autocorrelation coefficients. 

The present chapter is reserved for some applications of the 
above-mentioned simple schemes, in particular the schemes of linear 
regression. It will be seen that the results obtained are rather 
promising. We advance also that certain points in the applications 
will give rise to theoretical discussions — the analysis in the pre- 
vious chapters was chiefly concerned with the structural properties 
of the hypothetical models, while questions bearing upon their 
application were touched upon only incidentally. 

Of course, in applying different hypothetical schemes to observa- 
tional data, different methods are required. 

In the search for a hypothetical model suitable to a stationary 
phenomenon, the construction of an empirical periodogram is a 
classical method of fundamental importance. A careful periodo- 
gram construction is a safe method for discovering hidden periodici- 
ties if such are really present. On the other hand, approximate 
methods often involve definite dangers. For instance, the Bruns- 
Oppenheim method (see section 5) fails as often as the periodic 
elements are covered by a random component. The bias in ques- 
tion, which was already observed by J. I. Craig (1916), is of 
interest also in view of other methods of analysis. For this reason, 
the » Craig effect » will be examined in some detail. This is done 
in the next section. 

As shown in detail in Chapter III, periodogram analysis is an 
inadequate method of research in the cases of linear autoregression 
and of moving averages. The present study being focussed on these 
schemes, we have to look for appropriate substitutes for the periodo- 
gram method. In the memoir where G. U. Yule (1927) introduces 
the scheme of linear autoregression, an empirical parallel to the 
autoregression analysis as developed in section 19 forms the leading 
method of research. This method yields a first substitute for the 
periodogram construction. Next, as emphasized by Sir G. Walker 
(1931), the autocorrelation coefficients behave quite differently in 
the schemes of hidden periodicities and linear autoregression. Hence, 
the graph of the serial coefficients will yield important information 
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about the nature of the phenomenon considered. The analysis 
in section 26 shows that the scheme of moving averages, too, 
presents a characteristic graph of autocorrelation coefficients, a 
circumstance which augments the importance of the method proposed 
by Walker. For the sake of brevity in writing, the graphs of 
serial and autocorrelation coefficients will be termed correlograms 
(i empirical and hypothetical respectively). 

In the following, the methods proposed by G. TJ. Yule (1927) 
and Sir G. Walker (1931) will be carried further on the basis of 
the theoretical investigations in the previous chapters, and used 
in the applications to empirical data. A critical survey of the 
original methods of Yule and Walker will be given in section 29. 
In section 30 follow a summary and a critical examination of the 
modified methods. 

The two sections concluding the present study are reserved for 
applications of the scheme of moving averages and the scheme of 
linear autoregression. Both these schemes of linear regression may 
be attached directly to familiar lines of time series analysis. 
Returning to this point later, especial attention will be drawn to 
the different type of forecast yielded by these schemes as compared 
with the scheme of hidden periodicities. While the forecasts 
obtained in the latter scheme cover an infinite future, the former 
schemes will yield efficient forecasts only over a limited period of 
time. However, this limitation is outweighed by the greater effi- 
ciency in the short time forecasts yielded by the schemes of linear 
regression. 

Finally, in discussing the applications, certain generalizations of 
the schemes of linear regression will be touched upon. 


28. On the CRAIG effect. 

In section 5 we have surveyed a few methods for separating the 
individual components in a sum of harmonics (17). The classical 
method being the construction of a periodogram, the short cut 
indicated by S. Oppenheim (1909) (see p. 18) is based on the 
difference relations satisfied by the function (17). 

The scheme of hidden periodicities consists of a composed har- 
monic on which a random component is additively superposed. 
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Even in this case a periodogram will point out the frequencies of 
the individual harmonics. But as emphasized by J. I. Craig (1916), 
the Oppenheim method will be biassed by the random component. 
Proceeding to an examination of this bias, which will be termed 
the » Craig effect*, it will be sufficient for our purpose to consider 
a scheme of simple structure. In doing this, we shall regard the 
composed harmonic as a sample series of a singular process. This 
device, which will not affect the proof, is used merely in order to 
illustrate the connexion between the scheme of hidden periodicities 
as defined in section 8 and the process of hidden periodicities as 
defined in section 1 5 Si. 

Let {ip it)} stand for a singular process satisfying the relation 

J 2 ip it - 1) + k 2 • ip it) = 0, 0 < k < 2. 

Let {rjit)} be purely random, and independent of {ip it)}. Further let 
{?;(£)} have a finite dispersion and a vanishing mean, and let a 
process of hidden periodicities be defined by 

(266) {?(*)} = W>(0} 4 {rjit)}. 

Denoting by iptit, t — 1, . ., t — *?) = [ipi it), ipi it— 1), . ., ipi i t — n)] 
a sample series section of the process {ip it)}, we have 

4 2 ipi it - 5 - 1) + k 2 xpi it-s) = E[4 2 xpit-\) + k 2 ip it)] = 0. 

Now, let k 2 be unknown. The value delivered by the Oppenheim 
method is that minimizing the expression 

-4-7 V [J* %pi(t-S~1) + tflpiit- $)] 2 . 

A short reduction leads to the following value which is obviously 
unbiassed, 

(267) k 2 = — 2 ipdt — s) • J 2 xpi it — s — 1) / 2 ip} it — s) = 

8=1 8=1 

= - E [ipit) • J 2 ip it - 1)] / E[ip 2 it)]. 

According to (31), the frequency A of the harmonic ipi it) is given by 

(268) A = 2 arc sin i/2. 
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Let it next be assumed that a sample series 1, . ., t — ri) 

is given, and that we know neither P nor the sample series xpi 
and rji. The assumption concerning an additive random element 
corresponds to the actual situation when seeking for a period in 
an empirical time series. Applying now the Oppenheim method, 
we must minimize 

(269) — — r • S 1 [z/ 8 Sdt-s- 1) + s)] 8 , 

which gives (cf. (267)) 

(270) P = - "s' Si « - s) • z/ 2 St (* - * - 1) / V & (t - s). 

«=1 a=i 

This expression depends on the actual path of the sample series 
Si(t,t— 1,.., t — n\ but a sufficient approximation is delivered by 
the F-value minimizing the expression 

(271) Et [J 2 x/j (t - 1) 4- P xfj (t)] 2 + E [J 2 r] (t - 1) + P rj (*)] 2 , 

^ n — 1 

where we have written Ei \f [xp]} for lim 'Lf[\pi(t — $)]. 

n — *oo XI 1 8=1 

A short calculation shows that this approximation reads 

(272) P ~ [- Et ip(,t)-J 2 xp(t-l) + 2Er] 2 (0] / [Et (t) + E rj* CO]. 

The difference between on one hand (270) and (272) and on the 
other hand the unbiassed formula (267) gives rise to the Craig 
effect. Formula (272) shows that this effect will depend on the chance- 
determined amplitude of the harmonic constituting ipi (£, t — ■ 1 , . ., t — n). 

It is seen that the F-value given by (272) equals P when, and 
only when, P == 2, i. e. when t pit) has the period 4 time units. 
A brief reflection shows further that the Oppenheim method will 
under-estimate the period if this is above 4 time units, while the 
reverse will be true if the period is below 4 time units. Moreover, 
the larger the variance of the random component the larger is 
the Craig effect. 

Illustrations. Two simple examples of the Craig effect will be given on the basis 
of the model series and (&V) presented in table 4. Applying the relations 

(270), (268) and (18), the following results were obtained, the sums running from 
t — 2 to t = 999. 
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2 Sit ■ Sit — i 

SUM* 

p 

l 

p 


Model series (iify 
— 4041 
1112 
3*6340 
144-8° 

2*486 


Model series (S2 * 2 b 
— 5668 
1711 
3'2542 
128*8° 

2*795 


The correct value of the period being 2 units in both the time series, the Craig 
effect is seen to be substantial. The values Jc 2 obtained are in good agreement with 
the approximate formula (272). In fact, observing that in each of the two cases 
E i [ip (£) • d 2 lp(t — 1)] = — 4, and that the variances of the random components are 
respectively *2 and *6, we get in the first case 4*4/l*2 = 8*667, and in the second 
Tz 2 ~ 5*2/1*6 = 3*25. In full agreement with the remarks attached to formula 
(272), the period is over-estimated, and the Craig effect is larger in the second model 
series than in the first. 


In the above analysis of the scheme (266) we have examined the 
Oppenheim method for determining the parameter h in an approach 
of type l)~i-£® or £(©~ (2 - k) • £(t - 1) - g(f — 2). We 

shall next apply the same method in starting from a more general 
approach, viz. 

(273) 1(6-0, • SCt - 1) + a, • S(t - 2). 

Letting as before £*•(£, 1? • •> t — n) represent a sample series 

section connected with the scheme (266) of hidden periodicities, we 
must in the present case minimize (cf. (269)) 

1 n — 2 

(274) 7 2 [Silt - *) - a, • Silts- 1) - d 2 • Silt - s - 2)] 2 

to — l 8=o 

in respect of dj and a 2 . Now, using an approximation of the same 
type as in (272), and paying regard to (46), we -get 

(275) r x — dj — a 2 * i\ ~ 0, r 2 — a x • r x — d 2 ~ 0, 

having written ?•* = cos Xh/{ 1 + dj). Here X is the frequency of a 
sample series connected with the singular process {xp(t)}> and 
di — D 2 (rj}/ L \ (xp), where Di (ip) is the dispersion in the sample series 
%pi(t, t — 1, . . .). Solving (275), we obtain 
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(276) 


- ^ 9 sin 2 A -f dl 
0,1 sin 2 A + 4 di + 4 dt 


cos A; 


sin 2 A — 2 d\ - cos 2 A ^ , 

sin 2 A + 4dJ + 4d? “ “ 1 


The minimizing of (274) is obviously analogous to that phase of 
the autoregression analysis described in section 19, where £® is 
linearly approximated by £(£■— 1) and £(£ — 2). It is in view of 
this analogy that the above formulae are of interest. Now, if 
D(rf)~ 0, we obtain from (276) the coefficients d 1 = 2 cos A and 
d 2 = — 1 appearing in the identity (cf . (35)) 

£ if) = xf) (0 = 2 cos 1 ) — ip (t — 2). 


It should further be observed that we have in this case cos A = 
= a 1 /2l /r — d 2 . On the other hand, if D{rj) =t= 0, the latter relation 
will be disturbed by a Craig effect. In fact, we get from (276) 


(277) 


cos A • 


(sin 2 A -f di) • cos A 


V^sin 2 A — 2 d} • cos 2 A) (sin 2 A 4- 4 $ +■ 4 dt) 


Approximating the period p of \p(t) by means of the biased frequency 
A given by (277), the Craig effect is seen to be particularly large 
if sin A is small. It would serve no purpose to discuss the sign of 
the deviation or to enter into details on a singular process {ip(t)} 
of general structure. 

Illustration. Considering the model series (£2^) dealt with in the previous illus- 
tration, we have found by minimizing (274) — 910 = ^ * 1114 — d 2 • 911; 930“ 
= — a x ■ 911 + d 2 • 1114. This system gives dj = — '4051, ^ = '5036. Since 

= — 2, and a 2 — 1 the method examined is completely misleading. Formula 
(276) explains the failure — in the present case sin A = 0, and the resulting d- 
values will show no tendency whatever to approximate the a-values sought for. As 
is readily verified, (276) gives — d 1 ,>j d 2 ~ , 4167. 

Summing up the above analysis, we conclude that the Oppenheim 
method yields no adequate substitute for a periodogram construction 
in case the periodicities are covered by a random element, and it 
does not seem worth the trouble to derive modifications neutralizing 
the bias involved. 
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29. On earlier applications of the scheme of linear autoregression. 

As far as I am aware, there are only two earlier investigations 
which present direct applications of the schemes of linear regression 
to empirical data, viz. those already referred to by G. U. Yule 
(1927) and Sir G. Walker (1931). These memoirs are concerned 
with sunspots and air pressure respectively, and in both cases it is 
the scheme of autoregression that is applied. An examination of 
the main lines of these investigations in the light of the previous 
analysis will be given in the present section. For the sake of 
completeness, we shall also touch upon a passage in the already 
mentioned study on expectance theory by K. Stumpff (1936), where 
an empirical correlogram is dealt with by use of a method related 
to that proposed by Sir G. Walker. 

The memoir of G. U. Yule (1927) starts with a discussion of a 
model series, say £t, constructed on the basis of a relation of type 

(278) — 1c i + 2 = % — 2 < 1c < 2, 

The purely random series fj t is obtained by dice-throwing. The con- 
stant 1c is chosen in the interval (— 2, 2), which implies that the 
roots of the characteristic equation of (278) are complex, and of 
modulus unity. Thus it follows from the general analysis in section 
22 that a process {£(0} corresponding to (278) is non-station ary. 
However, the evolutive tendency is rather weak, and the 300 ele- 
ments constituting Yule’s model series actually present fluctuations 
of a stationary appearance. 

The parameter h in the model series £« being chosen so as to 
correspond to a period of 10 time units, Yule lays stress upon the 
structural resemblance between his model series and the yearly 
index of sunspots. Pursuing this suggestion in the later sections 
of his memoir, he gives two different methods for a refined analysis 
of the structure of the index. Yule works on the A. Wolfer 
index 1751—1923. 

In his first approach, Yule starts from the hypothesis that the 
sunspot index, say satisfies a relation of type (278). In order 
to determine A, he minimizes the sum of the squared » disturbances » 
fjt. Interpreting the relation (278) as ruling the movement of a 
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pendulum subjected to random shocks, Yule derives the intrinsic 
period of the pendulum which would correspond to the i-value 
obtained. The period thus derived being too short, viz. p = 10 '08 
years, he finds that the hypothesis (278) gives abetter value, p = 11 03 
years, when applied after graduating the sunspot numbers. 

Having thus assumed a linear regression (278) between + ^t—t 
and £t_ i, Yule applies a graphic test of this approach. Eeferring 
to the graphs given on p. 277, he says on the same page: »On the 
whole, however, divergence from linearity does not look as if it 
would be a serious trouble*. 

In his second approach, Yule starts from the relation 

(279) + «i Ci— i + a i C<— s = Vt- 

Proceeding as in the case (278), he determines the parameters (a) 
by minimizing 2 fj ?, and interprets the results by the use of the 
analogy with a pendulum. The values found for a t and a 2 corres- 
pond to a damped intrinsic oscillation. The ungraduated index 
gives the period p = 10'600 years, while the graduated index as 
before gives a better value, p = 11' 164 years. 

The graduated index gives rise to a smaller variance in the 
disturbances fjt than the ungraduated index. Dividing the variance 
of fjt by the variance of the sunspot index under analysis, the 
approach (278) gives "243 for the ungraduated index, and ’115 for 
the graduated one. The corresponding values in the approach (279) 
are ’198 and '102 respectively. 

In applying generalized hypotheses of type (278) and (279), Yule 
finds that the introduction of more parameters does not bring on 
a marked decrease in the variance of the disturbances. In other 
words, the experiments »fail to suggest the presence of any period 
other than the fundamental, a conclusion entirely in accord with 
the work of Labmor and Yamaga» (p. 295). 

In a summary, Yule suggests that the sunspot numbers should 
be regarded as analogous to the data that would be given by 
observations of a disturbed periodic movement, such as that of a 
pendulum subjected to successive small random impulses* (p. 294). 
Let us discuss this hypothesis in the light of the previous analysis 
As already pointed out, the approach (278) does not correspond 
to a stationary process, for if the series ft were purely random, the 
secondary model series It would present oscillations increasing m 
amplitude with time, i. e. be evolutive. In view of this observation, 
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it is not surprising that the disturbances % as calculated from 
(278) on the basis of the k- value previously determined, form a 
series of non-random character. Quoting Yule, the series fjt shows 
»a tendency for positive disturbances during the approach to the 
maximum of the sunspot numbers, negative during the approach 
to minimum* (p. 294 f.). 

We have already seen that the approach (279) gives rise to a 
somewhat smaller variance in the disturbances fjt . However, having 
introduced a second parameter, the slight reduction in the variance 
of fjt does not give a sufficient reason for preferring (279) over 
(278). On the other hand, (279) corresponds to a proper stationary 
process, a circumstance speaking in favour of this approach. 

In itself, the outcome of significant constants a x and a 2 does 
not imply that the approach (279) is adequate. Without further 
evidence, we cannot even conclude that (279) is better than the 
hypothesis of a strictly periodic component in the sunspot index. 
In fact, our analysis in the previous section has shown that an 
autoregression analysis will give rise to non-vanishing coefficients 
[a] also in the case of hidden periodicities. It is interesting to 
notice that even the effect of the graduation — the increase in the 
period — might be explained by assuming the index to be ruled 
by a scheme of hidden periodicities (cf. p. 137). However, a 
sufficient reason for rejecting the latter hypothesis is that the 
deduction of a strictly periodic component actually leaves an 
» error* with a dispersion substantially above that of the disturbances 
obtained from (279). In fact, the assumption of one harmonic 
component in the sunspot index would explain at most 28 % of the 
variance in the index (see e. g. K. Stumpff (1937), p. 126). On 
the other hand, we have seen that the approach (279), which 
contains only two parameters, is able to explain at least 80 % of 
the variance in question. In this connexion it is rather interesting 
to notice that the periodogram of the sunspot index given by 
Stumpff (1. c.) bears a certain resemblance to our fig. 4 (p. 116), 
and thus agrees with the hypothesis of linear autoregression. (Cf. 
also the remarks attached to (131)). 

The values found by Yule for the parameters in (279) are 
a l = — 1 ’34254, a 2 = *65504 for the ungraduated, and a x = — 1 ’51527, 
a 2 = *80245 for the graduated sunspot index. Using the analogy 
of a swinging pendulum, these values correspond to a rather heavy 
damping — in the case a 2 = m 80245, the amplitude of a swing 
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would be reduced to 29 % in the duration of one period (see 
G. XT. Yule (1927), p. 282). With this heavy damping, a purely 
random series of impulses would not be likely to produce such 
large amplitudes as in the sunspot fluctuations. In full agreement 
with this argument, which in section 32 will be developed as a 
test of the scheme of linear autoregression, the disturbances fjt cal- 
culated from (279) on the basis of the sunspot index present 
variations of a non-random type. We conclude that the situation 
is not quite covered by Yule’s statement that »the disturbances do 
occur just in the kind of way that would be necessary to maintain 
a damped vibration * (p. 286). 

Summing up the above discussion, we have seen that in replacing 
the approach of strict periodicity by a hypothesis containing an 
acting random element, G. IT. Yule (1927) obtains a substantially 
better fit to the sunspot data. In the terminology of the present 
study, the approach (279) as applied to the ungraduated index 
corresponds to a scheme of linear autoregression. On the other 
hand, as applied to the graduated index, it is obvious that the 
model (279) approximately corresponds to the assumption that the 
ungraduated index is ruled by a scheme consisting in a purely 
random process independent of and superposed on a process of 
linear autoregression. 

As mentioned by Yule, the disturbances fjt calculated from (279) 
present a certain systematic variation. This non-random behaviour, 
which seems to be conditioned by the small value found for a 2 in 

(279) , remains unexplained by the hypothesis of linear autoregression. 
In view of this circumstance, it seems to me as if the sunspot 
index calls for further investigation. Perhaps the methods developed 
in section 32 would yield a scheme fitting the data better. But 
it is also possible that more satisfactory results would be obtained 
in an approach involving a non-linear function of the index. Since 
to pursue these suggestions falls outside the program of this survey , 
we shall end our discussion of the Yule memoir. 

Sir G. Walker (1931) follows up the Yule approach (279), and 
studies an autoregression relation of type 

(280) &— i + "■ + ** &-* 3=5 V*- 

As mentioned in section 24, Walker finds that the autocorrelation 
coefficients corresponding to (280) satisfy the relations (cf. p. 104) 
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(281) Vk + rjc — i 4* * — I- ah fjfe-A = 0, 1c > h. 

Assuming that the roots of the characteristic equation (34) are 
different, he further gives n as the general solution of (32). 

We are now in a position to examine Walker’s methods for 
applying (280) to empirical data. The basic idea is simply to 
compute the serial coefficients f*, and to determine the constants 
an so that the relations (281) will be approximately satisfied when 
replacing r k by f*. 

Sir G. Walker works on air pressure data from Port Darwin 
1880 — 1925, taking the quarter of a year for time unit. The graph 
of serial coefficients — in our terminology the correlogram — 
ranges from Tc= 1 to h = 147 quarters (p. 531). The graph shows 
a rapid decrease from = * 76 to f 6 ^ 0. For larger A-values, the 
correlogram presents fluctuations with rather small amplitudes. In 
fact, up to h = 100 all the coefficients fk are less than ‘3 in 
modulus. 

Sir G. Walker finds that in the interval 0 ^ h < 40 a fairly good 
approximation to the correlogram is yielded by the function 

(282) n = -19096)* cos nk/G + -lSCOS)* + '66 CTl)* 1 . 

This function, which is seen to involve a damped harmonic with 
period p = 12 quarters, satisfies the difference equation 

rt — GSbrk-! + 4'43r*_ a — 2 ‘ 71 4- -64n^ = 0. 

Concluding inversely from (281) on (280), Walker finally arrives 
at the representation 

(283) £< - 3 • 35 £f-i + 4'43 — 2 1 71 ^_8 + • 64 £<_ t = fit. 

Proceeding to an examination of Walker’s methods, we observe 
in the first place that an unconditioned conclusion from (281) on 

(280) is not permitted. In fact, the autocorrelation coefficients 
corresponding to the relation (280) satisfy not only the relations 

(281) but also (222 223), the latter relations not having been 
observed by W^alker. It is seen that the coefficients r, , r t , . .. rti — i 
will be uniquely determined by the system (222) in terms of the 
coefficients a n . In other words, the coefficients in (280) determine 
not only the periods and the damping factors of the individual 
components in the expression (33), but also the coefficients of the 
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components. Thus, without anything further we cannot be sure 
that the autocorrelation coefficients corresponding to the approach 
(283) will be given by the graduating expression (282). On the 
contrary, the system (222) corresponding to (283) actually gives 
rise to autocorrelation coefficients which are substantially different 
from (282). Having read off the values = *75, r 2 = *55, r 3 = *35 
from the graph of (282) given by Walker (p. 528), I have found 
from (222) the values r 1 = *93, r 2 = ’ 72, r 3 = *43 for the autocorre- 
lation coefficients belonging to the approach (283). 

In view of the oversight pointed out, it is not surprising that 
the relation (283) gives rise to a larger dispersion in the disturbances 
f;t than in the air pressure data As a matter of fact, 1 have 
found I ) (//) = 2*4 J)(£), while the result /)(?;) = *28 /)(£) obtained by 
Walker (p. 530) is based on an incorrect use of relation (317) 
(see also our foot-note remark on p. 112). We conclude that if 
the approach (283) is to be applied to the air pressure data, the 
coefficients must be modified. Having stated this, it is rather 
interesting that the simple approach 

(284) & — ’ 73 i = Vt 

gives a fairly good fit to the first few serial coefficients. In fact, 
according to (238) the approach (284) gives i\ — 73, r 2 = 53, 

r H = ‘39, r 4 = *28, while the air pressure serial coefficients given 

by Walker (p. 528) read r, = 76, f* = *5(>, r 3 = ‘ 3G, f 4 =-’18. A 
short calculation shows further that (284) gives D (rj) = '68 7) (£l 
Let us in conclusion attach a few remarks to the empirical 
correlogram presented by Sir G. Walker (p. 531). As already 
mentioned, the serial coefficients show rather small deviations from 
zero in the interval 3 < k < 40. On the other hand, the increase 
in amplitude for certain values > 40 might be due to the successive 
reduction in the number of correlates. Perhaps this argument is 
sufficient to explain also why the fluctuations are somewhat largei 
in that alternative variant of a correlogram given by Walker, 
where all serial coefficients are based on 77 pairs of correlates. As 
the fluctuations, furthermore, seem rather irregular and aperiodic 
— at least to my eye — it is doubtful whether it would be possible 
to improve sensibly the approach (284) by taking into account more 
distant elements y— 2 , 3 , etc. In this connexion it is rather 

interesting to notice that according to the general analysis there 
exists no process of linear autoregression having (282) for auto- 
10 - 635697 . H. Wold. 
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correlation coefficients. Another reason for resting satisfied with 
the simple approach (284) is that the ordinates in the periodogram 
presented by Sir G. Walker (p. 526) are all lying on about the 
same level — this periodogram does not like that of the sunspots 
suggest a scheme of linear autoregression with a tendency to 
periodicity (cf. p. 142). At any rate, a more detailed analysis of 
the air pressure data is beyond the scope of the present survey. 

Using formula (127), EL Stumpff (1936) develops a periodogram 
theory which generalizes the classical Schuster theory. In apply- 
ing his theory, Stumpff works on air pressure data [Potsdam 3 /t 
1925 — 3 V 6 1926, equidistance 1 hour), and replaces the coefficients 
Tk in (127) by a corresponding set of graduated serial coefficients. 
Claiming that the graduated values belong to a scheme of linear 
autoregression of type 

(285) C(0 ““ 2 p • £(i - 1) + p* • 2) = rj(t\ 

Stumpff makes a mistake similar to Walker’s pointed out above. 
Proceeding as in the developments on p. 112, I have found the 
formula n- = [1 + £*(1 — jp*)/ (1 + p*)]-p k for the autocorrelation 
coefficients belonging to the scheme (285), while K. Stumpff (p. 53) 
gives r* = (1 — k • log p) • p k . 


30. Preliminary survey of methods. 

In this section we shall give a brief summary of the methods 
used in the later applications and a few critical remarks on the 
scope of these methods. 

As pointed out in earlier sections, a careful analysis of the 
structural properties of a time series requires statistical data 
covering a rather long period. On the other hand, the series must 
not change its general character in the course of the interval of 
observation, for then a stationary scheme would be inadequate. 
For instance, if a trend is present in the material, it should be 
removed before starting the analysis (cf. p. 1). 

Considering a scheme (39) of hidden periodicities, and disregarding 
the value r 0 = 1, the correlogram r* consists of superposed harmonics 
such that the periods of the individual components equal those in 
the time series considered (cf. (46)). In the two type cases of linear 
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regression, on the other hand, the correlogram has the horizontal 
axis for asymptote. In fact, in the scheme of linear autoregression 
the correlogram n is a function (33) such that the roots of the 
characteristic equation (34) are of modulus less than unity, and in 
the scheme of moving averages all the autocorrelation coefficients 
are zero beyond a certain i-value (cf. also (177)). 

For the sake of concreteness, we show below three hypothetical 
correlograms exemplifying the type cases considered. 



Fig. 6. Correlograms illustrating the schemes of hidden periodicities ( thin line), 
linear autoregression (broken line), and moving averages (thick line). 


The correlograms in this figure are based on the following parameters. In the 
case of hidden periodicities, the correlogram has been derived from formula (46), 
where we have chosen s ■= (\ ==: 1 ; 7)" (77)= 125; Aj = 7ir/t). The case of linear auto- 
regression is illustrated by a correlogram of type (243), having taken C—V *8, 
X^n/lS. The moving average correlogram, finally, has been obtained from (250), 
putting /* = 4; & a = ’4, 6 S — 3, h±— 2 

Because of the different behaviour of the autocorrelation coef- 
ficients in the schemes mentioned, it may be expected that we would 
obtain useful suggestions by inspecting the empirical correlogram 
when searching for an adequate scheme to be applied to an ob- 
servational time series. For this reason, the construction of an 
empirical correlogram is taken as the starting point in the following 
applications. 16 

It should be observed that the correlogram construction by form- 
ula (13) involves a relatively small amount of numerical computa- 
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tion. No trigonometrical or other mathematical tables are required. 
Another definite advantage is that the correlogram is obtained 
directly from the statistical data, without any preceding prepara- 
tion of the material. Accordingly, the empirical correlogram seems 
particularly well suited for serving as a first indicator of which 
type of scheme to apply to the data. 

If the empirical correlogram suggests a scheme of hidden period- 
icities, the next step would be to construct a periodogram for a 
more detailed analysis of possible periodicities in the material under 
investigation. 

Next, if the correlogram suggests a scheme of linear autoregres- 
sion, our first problem is to find a scheme (101) such that the cor- 
responding hypothetical correlogram will fit the empirical one. 
The chief difficulty is to derive suitable values for the coefficients 
(a) — when having arrived at a set of coefficients (a), the corres- 
ponding autocorrelation coefficients will be uniquely determined by 
the system (221 — 2), and the residuals fj t by the relations (280). It 
is further a desideratum that these residuals be as small as pos- 
sible. Having seen above that these problems are more intricate 
than emphasized in earlier studies of the graph of serial coefficients, 
it will be found that an empirical autoregression analysis as pro- 
posed by G. U. Yule (1927) will be useful in this connexion. 

Finally, it may happen that the empirical correlogram will suggest 
a scheme of moving averages. As far as 1 know, the problem of 
fitting this scheme to observational data has not been attacked in 
earlier literature. A fundamental problem in this sphere was form- 
ulated by Prof. H. Cramer in his 1933 Course, viz. to find a mov- 
ing average with a prescribed correlogram. It will be seen that 
the relation (256) gives a starting point for attacking this problem. 

Having now given a preliminary survey, the details of the methods 
will be discussed when presenting the results of their application. 
The present section will be concluded by attaching some remarks 
of general scope concerning the limitation of the methods outlined. 

In time series analysis, significance problems are extremely in- 
tricate. In dealing with serial coefficients, we have i. a. to pay 
regard to the fact that their magnitude is conditioned by the size of 
the statistical masses to which they refer. For a discussion of this 
point reference is made to Appendix B in the 1st edition of this 
book (see also p. 110). 

The following applications aim only at illustrating the qualitative 
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differences between various hypothetical models. Consequently, all 
questions about the significance and the interpretation of the quan- 
titative results fall outside the scope of this study, and again an 
explicit warning is given against attaching importance to the num- 
erical values found for the parameters of the different models 
fitted to the observational data. 

However, even when limiting the program to an analysis of the 
qualitative structure of a phenomenon as described by a time series, 
we must be cautious when interpreting the results. Stating the 
case briefly: The results will to some extent be conditioned by the 
methods used in the analysis. Of course, in point of principle the 
situation is the same in all applications of hypothetical models to 
empirical data. Let us dwell a moment on some circumstances 
which are peculiar to time series analysis, especially as based on 
the correlogram and the autoregression methods. 

The correlogram sums up the autocorrelation properties of a time 
series, and the autoregression analysis, too, is based solely on the 
autocorrelation coefficients. We conclude that neither method is 
able to distinguish between different schemes with coincident auto- 
correlation coefficients. Having already seen examples of this when 
dealing with the scheme of moving averages (cf. section 26), further 
examples are readily obtained by using non-linear operations in the 
construction of stationary processes. For instance, letting {rjit)} 
represent a purely random process, it is evident that 

(286) gtf) = ij(0-i?(t- 1) 

will define a stationary process {£(£)}. Assuming that jE[ij(©] = 0, 
a short calculation will show that the process {£(£)} is non-auto- 
correlated. 

Thus, if we have found a hypothetical scheme that fits well to 
an empirical correlogram, it is perfectly possible that there are 
other schemes which yield an equally close approximation. When 
it is necessary to choose between different schemes, it may happen 
that theoretical arguments will speak in favour of one of the 
schemes. As exemplified in the applications, the schemes of linear 
regression often seem plausible from theoretical viewpoints, at least 
as a first approximation. On the other hand, a rational choice 
between different schemes may be alternatively based on an ex- 
amination of other structural properties of the time series than its 
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serial coefficients. Such lines of research, however, fall outside of 
the program of the present study. 

Another point which must be kept in mind has special reference 
to the autoregression analysis. Having in sections 19 and 20 sub- 
jected a general stationary process to an autoregression analysis, the 
investigation resulted in a canonical form for the process considered, 
viz. a decomposition in two mutually non-eorrelated processes, each 
of a structure readily comprehended in respect to certain fundamental 
properties. Now, even if a complete parallel to this analysis could 
be carried through when dealing with empirical data — which of 
course is impossible, one reason being the necessity of dealing with 
only a finite number of observations — it is not certain that the 
autoregression analysis would be an adequate method of research. 
In fact, in point of principle an autoregression analysis can reveal 
only linear interrelations between the elements in a time series. 
Tor instance, the implicit relation 

(287) §(8 = rj(t) + p • £ 2 ( t — 1), 

where is purely random, P[\ rj(f)\ < $] - 1, and |^|<1, defines 

a stationary process {£(£)}; a linear autoregression analysis would 
here give rise to an infinite sequence of residuals, and to a canon- 
ical representation which is more complicated than the simple rela- 
tion (287). 

The above argument shows that there is a certain risk of over- 
estimating the outcome of a linear autoregression analysis. When 
proceeding to residuals of higher order, more parameters are in- 
troduced, and it may be that a simpler, possibly non-linear approach 
would give better results. As mentioned before, the use of non- 
linear methods does not fall within the scope of this study. 


31. Some applications of the scheme of moving averages. 

In economic theory, a great deal of interest has recently been 
paid to the schemes of linear regression. However, the discussion 
of the advantages of the new ideas over the hypothesis of strict 
periodicity seems to have been carried on exclusively by general theo- 
retical argumentation, without attempting to fit the recommended 
schemes directly to observational data. Considering, in particular, 
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the scheme of moving averages, this has not so far as I know been 
tried on empirical time series in other fields of scientific research 
either. When selecting statistical material in order to give applica- 
tions of the previous analysis, I chose economic time series for one 
reason because of the lack pointed out. In connexion with the 
account of these applications given in the sequel, we shall touch 
upon some related lines of economic research where the previous 
developments seem to yield proper tools for a deeper analysis. 

As mentioned in section 9, J. Bartels (1935) has found »quasi- 
persistent periodicity » in certain geophysical time series. A periodo- 
gram analysis here being inadequate, these series invite an ap- 
plication of the schemes of linear regression. Thanks to their at 
once flexible and simple construction, these schemes often seem 
plausible also a priori. A few arguments on this line will be 
touched upon in the sequel when discussing certain geophysical and 
other phenomena which from a theoretical viewpoint might be 
interpreted by means of the schemes of linear regression. 

The series of yearly wheat prices in Western Europe 1518 — 1869 
compiled by Sir W. Beveridge (1921) was chosen for my earliest 
experiment in applying the correlogram method. The purpose 
being to apply a stationary scheme, the analysis was concerned with 
Beveridge’s trend-free index of fluctuations (p. 449 ff.). In order 
to avoid changes in the structure of the index, the analysis was 
restricted to the last hundred data. An account of the analysis 
follows. 

Having made the inconsequential modification of reducing the 
Beveridge index by 100 units, the time series investigated is given 
in col. (2) of table 7. The first 15 serial coefficients obtained from 
this material with the use of formula (13) read as follows. 


Table 0. 

Serial 

coefficients 

of the Be 

VERIDGE 

ivheat 

price 

index 



1770 — 1869 . 





r x = *614, 

= 

•090, 

r 0 = —*156, 

r* = - 

*115, 

f : s * 

l e 

© 

o 

r Q — ’003, 

r 7 = 

-*006, 

n = —'lie, 

r 0 = — 

166, 

*10“ 

-•102, 

r u = 033, 


*084, 

r 1# =-*011. 

Tu = 

•021, 

r l5 = 

*136. 


The correlogram based on these coefficients is shown in fig. 7. 

It is seen that f i is rather large, and that all of the following 
serial coefficients are lying in the interval — T7 < F* < '14, i. e. 
rather close to zero. To my eye, the correlogram definitely suggests 
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a scheme of moving: averages. Accordingly, the next step in the 
analysis will be to search for a moving average with autocorrelation 
coefficients approximating the serial coefficients under investigation. 
As mentioned before, this problem is due to H. Cramer (see p. 123). 

Quite generally, the problem before us may be stated as follows. 
A set of numbers u. 2) . . ., Uh being given, does there exist a mov- 
ing average (249) with autocorrelation coefficients r* such that n-=wj t 
for 1 < k ^ h? If the answer is in the affirmative, we know from 
section 26 that there in general will exist a finite group of moving 



Fig. 7. Correlogram of the Beveridge wheat price index 1770 — 1869 ( thick line), 
and hypothetical correlograms corresponding to the approaches (292) and (294) 

(broken lines). 

averages with the prescribed autocorrelation coefficients, and we 
are also in possession of a direct method for determining the coef- 
ficients (b) of these moving averages. 

Paying regard to the relation (256), we conclude that if there 
exists a moving average (249) satisfying our conditions, we must 
have 

(288) u (x) = u h x h + u h - 1 x h ~ l -h • ■ + it x x + 1 + — + • • + + -j = 

x x x h 

= K*^ 1 ^ ^ 1 xh ~ l + ‘ + bh—i i x + bh) (bh + + ■ ■ + ~^~z 1 + — j • 

If z 0 is a root of the equation u(x) = 0, then l/x Q is another 
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root. It follows that the substitution z = x + a T 1 will transform 
u(x) into a real polynomial in z of order h, say viz). Let us write 

(289) v ( z ) = v 0 z h + tq z h ~ l 4 + Vh — i z + Vh- 

The successive calculation of v 0 , v u . . from the coefficients Ui needs 
no comment. 

It is evident that if z is a root of v(z) = 0, two roots of u ix) = 0 
will be obtained from the equation 

(290) P(x, z) — x i — zx + 1 = 0. 

The roots of this equation are given by 



The product of the roots being unity, we conclude that unless both 
the roots are of modulus unity, one of them is situated inside, and 
the other outside the unit circle. 

Denoting conjugate complexity by an asterisk, we know that if 
z is a complex root of = 0, another root reads 0*. Further, 
if P ( x , z) = 0 has the roots x and \/x, it is evident that P{x,z ) = 0 
has the roots x * and \lx*. In that case one of the real polynom- 
ials (x - x ,) ix - xX) and [x - • [* - r 1 ] must be a factor 

in the polynomial 

b (x) = x h + b 1 qc? 1 * 1 + * * • + h - 1 a? + ^ 
appearing in (288). 

In case v(z) = 0 presents a real root, say z Q , we must distinguis 
two cases. If | g 0 1 > 2, both ar, and x, are seen to be real. Keeping 
in mind that either r-r, orr-a; s isa factor in b ix), we con- 
clude that this case corresponds to the real roots of the equation 

6( On the other hand, if |* 0 | < 2, we know from (291) that x t and 
Xi are conjugate complex, and of modulus unity. The factors 
kc — x t ) and (x — x t ) being complex, we conclude that both of them 
must be contained in b Car). Since one zero of uix) corresponds to 
one zero of viz), this is impossible unless z 0 is a root of even 
multiplicity of v(z) = Q. 

After these remarks, the following theorem demands no explanation. 
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Theorem 12. A necessary and sufficient condition that there 
exists a moving average (249) with autocorrelation coefficients equal- 
ling ujc for 1 <^Jc <, h is that the auxiliary polynomial v (z) defined 
by (289) has no zero z 0 of odd multiplicity in the real interval 
— 2 < z 0 < 2. 

If this condition is satisfied, the sequences Lb) sought for will be 
given by the coefficients in the real polynomials b Lx) satisfying the 
relation (288). In full agreement with the analysis in section 26, 
we conclude further from the above discussion that there are at 
most 2 h such sequences Lb\ and that the polynomials b Lx) may be 
written on the form (x — x ) Lx — x) .. . Lx — xi), where — denoting 
by z 8) . . ., Zh the roots of vLz) = 0 — the real or complex quantity 
x L is a root of PLx,z) = 0, and x 2 is a root of PLx,z 8 ) = 0, etc. 

Eeturning to the Beveridge index of wheat prices, we shall 
give a few applications of the method outlined above. 

In the correlogram F* (see fig. 7), the small deviations from zero 
for h > 1 might perhaps be looked upon as pure chance products. 
Thus we are led to investigate whether there exists a moving 
average rj Lt) + b 1 r)Lt — 1) with autocorrelation coefficient r x equalling 
‘614. Putting A = 1 , and % = ‘614, and following the general 
method, we get uLx) = ’6 14 x + 1 +‘6 14 ar\ and vLz) = ‘6140 + 1. 
Since the root — 1/'614= — 1*63 of vLz) = 0 is lying in the critical 
interval — 2 < z < 2, we conclude from theorem 12 that there 
exists no moving average with r x — '614 and Vk = 0 for Tc> 1. 

A short reflection shows that in all moving averages of type 
V Lt) + rj Lt — 1) we have — ‘5 < r t ^ *5. It is further evident that 
there is only one average of this type such that i\ — ‘5, viz. 

(292) f (fi — m = 7) Lt) -I- rj Lt — 1) , 

Consequently, this average — the correlogram of which is shown 
in fig. 7 — will yield the closest fit to the prescribed value ^ = ‘614. 
If we rest satisfied with the simple average (292), we have to 
interpret the deviations between the serial coefficients fk given on 
p. 151 and the values r t — *5, r 2 = r s = ■ • • = 0 as due to chance. If 
a better fit is desired, averages involving more parameters Lb) must 
be used. A few examples of this will be given. 

As is readily verified, the approach u t = ‘614, u 8 — ‘090 gives 
u Lx) = ‘090 x % + ‘614 a: -f 1 + ‘614 ar* 1 + ‘090 ar 2 , and vLz) = '090z* + 
+ *614 z + *820. The roots of vL^) — 0 being z 1 = — 1‘82 and 
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z 2 = — 5 00, it follows from theorem 12 that there does not exist 
any moving average with the autocorrelation coefficients prescribed. 
However, a very slight modification in u % will suffice to remove 
z t from the critical interval. In fact, expressing that a root (291) 
shall equal — 2, we get (J r t • u^ 1 — 2) 2 = f ± • W7 1 ) 2 — (1 — 2 u 2 ) u 7 1 , 
which gives u % = J == *1 14. 

The function v (z) corresponding to = ‘614, u 2 = *114 reads 
*114 z* 4- *614 £ 4- *772 == 0, and we get = — 2*000, z 2 = — 3*386. 
In order to prepare the construction of the corresponding sequences 
(20, we solve P(sr, — 2) = x 2 + 2 x + 1 = 0, which gives the double 
root x = — 1, and P fce, — 3*386) = a? 2 + 3*486 x 4- 1 = 0, which 
gives the real roots x= — *3269, and x = — 3*0591. It follows 
that there exist two binomials b(x) which satisfy the conditions 
laid down, viz. b x (x) = (x 4- 1) (x 4- *3269) = x 2 4- 1*3269 x 4- *3269, 
and b 2 (x) = (sc 4 1) (x + 3*0591) = <z 2 + 4*0591 x 4- 3*0591. 

Using the terminology introduced in section 26, the binomial 
b x (x) gives rise to a regular moving average, viz. 

(293) Zi (fi “ wi = t) (ft 4- 1 *3269 rj (t - 1) 4- *3269 q (*- 2). 

Since there is only one more process in the group (£) of averages 
with the same correlogram as (0, it is evident that this one is 
symmetrical with (0, and thus given by 

£ a (ft - m = *3269 q (0 4- 1*3269 q (t - 1) 4- 17 tf - 2). 

In full agreement with the general theor} 7 , £ 2 (ft can alternatively 
be derived from b 2 (x) by multiplying the coefficients by KJK 2> 
taking for K.\ the sum of the squared coefficients in b x ( x ), and 
similarly for K\ — a short calculation will verify that K x ! 2T 2 = 
‘3269, etc. 

To check the coefficients (b) obtained, it is sufficient to compute 
the autocorrelation coefficients of (ft from the general formula 
(250). As it should be, we find ^ = *614, r 2 = * 114. 

Observing that the increase in u 2 from *090 to *114 has brought 
on a decrease in z x from — 1*82 to — 2*00, and an increase in z 2 
from — 5*00 to — 3*06, it is seen that the parameters are very 
susceptible to variations in the initial w-values. Another example 
of this is given by the fact that a second slight increase in u 2 
will make the roots z 1 and z 2 coincide. As follows from (291), this 
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will occur when u 2 satisfies the relation C307) 2 = u 2 (1 — 2 %), i.e. 
when u 2 = ' 1260. 

Using the new value u 2 = — *1260, and taking as before u t = 

r x = *614, we get viz) = *1260 z 2 -f *614 z + *7480. The roots of 

v iz) = 0 read z 1 — z 2 = — ■ *614 / ‘252 = — 2*4365, and those of 

P Cr l5 — 2*4365) = 0 are x = — *5225 and x = — 1*9140. Conse- 

quently, the binomial b ix) corresponding to the regular one among 
the moving averages sought for is simply ( x + *5225) 2 — x 2 + 
+ 1*0450# + *2730. Thus the regular moving average reads 

(294) Ci (0 -m = r}(t)+l *0450 rj it - 1) + *2730 rj it - 2). 

Because of the symmetry, it follows that the moving average 
corresponding to b ix) = ix + 1*9 140) 2 is given by 

(295) Cs (0 - m = *2730 rj it) + 1*0450 rj it - 1) + rj it - 2). 

The group (£) in this case consists of three processes. The remain- 
ing one is obtained from b ix) = ix + *5225) ix -f 1'9140) = x 2 + 
+ 2*4365 x + 1, A short calculation gives for this process 

(296) £3 it) - m = *5225 rj it) + 1*2730 rj it - 1) + *5225 rj it - 2). 

Checking the calculations, we find that the autocorrelation coef- 
ficients of each of the processes (294) — (296) are given by = *6140, 
r 2 = *1260. The correlogram of the group (294) — (296) is shown 
in fig. 7. 

The appreciable effect of even small changes in u 2 is evident 
from the above. Comparing the regular averages (293) and (294), 
it is seen that the increase in u 2 from ‘114 to *126 has caused a 
decrease in b x from 1*3269 to 1*0450, and a decrease in b 2 from 
*3269 to *2730. 

The following example shows in detail how the method works 
when viz) = 0 has complex roots. Let us start from the values 
u x = *60, u 2 — *09, — — *15, u±= — *10, which are seen to 

approximate very closely the first four serial coefficients in the 
index under investigation. With but little reduction we obtain 

— 10 s v iz) = 10 -t- 15 -49 - 105 g - 62. Solving v (*) = 0, 

we get g t = — 2*1272, g % = 2*5103, g z = -*9415 + *5240 i, z± = - 

— *9415 — *5240 i. Concluding from theorem 12 that there exists 
a group of moving averages with the prescribed correlogram, a 
short reflection shows that the group consists of 8 processes. 
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Solving P fee, ej = 0, we get x = — 7013, and x = - 14259, 
while P fee, z a ) gives a: = '4966 and x = 2'0137. The roots of 
P(.r, e 8 ) = 0 were found to be x = — ‘3381 — ‘6679 z, and x = — '6034 + 
+ 1 1919 i. According to the discussion preceding theorem 12, the 
roots of P(x,z i ) = 0 are obtained by replacing i by —i in the roots 
of P Or, * s ) = 0. 

Writing 

B Or) = Or 4- ’3381 — *6679 *) Or 4 - *3381 4- *6679 v = a ; 2 4 * 

4- ‘6762 a? 4- *560402, 

the regular moving average will be obtained from 6 (r) = 
Or + *7013) Or — *4966) • B(x) = x 4 ‘ + -8809 a ; 3 + -3505 a 2 - *1208 x - 
— *1952, and reads 

(297) 7] (t) + -8809 r\ (f - 1) + *3505 rj (t - 2) - *1208 rj(t- 3)- 

- * 1952 ^- 4 ). 

Squaring the coefficients, we get the sum K 2 = 1*95153. 

According to the general analysis, a second average with the 
same correlogram will be delivered by b x Or) = Or + 1*4259) Or — *4966) • 

• jB (a?) = x* + 1*6055 a ; 3 4* *4807 x 2 4- *0420 x — *3968. The sum of 
squared coefficients is 3*967917, which gives KlK 1 = *701304. 
Multiplying the coefficients in b x Or) by this factor, we get the 
coefficients in the corresponding moving average. This is found 
to be 

*7013 v (t) + 1*1259 rj (t - 1) + ‘3371 tj (t - 2) 4- *0294 17 (? - 3) - 

-*2783 77 (£-4). 

A third moving average in the group is seen to be obtained 
from Or 4- *7013) Or — 2*0137) • J50r). Proceeding as before, we find 
for the corresponding process 

*4966 7] (f) - *3159 77 (* - 1) — *8637 tj (t - 2 ) - *8395 17 (f - 3) - 

- ’3930 77 a -4). 

In the same way, the polynomial (a; 4- 1*4259) Or — 2*0137) ■ B Or) 
yields a fourth process in the group, viz. 

*3483 rj (f) 4 - *0308 r\ (t - 1) - *9432 77 (* - 2) - *7909 77 (« — 3) — 

- ‘5604 77 a — 4). 
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The four remaining averages in the group considered correspond 
to the complex roots x = — *6034 ± 1*1919 i of u(x) = 0. Due to 
the symmetry, these processes may be obtained directly from the 
four processes above by reversing the order of the coefficients. 
For instance, the regular average (297) gives 

— *1952 17(0 — *1208 rj(f — 1)+ *3505 rj(t — 2) + *8809^(2 — 3) + 17 (£—4). 

The above computations have been checked by verifying that 
the autocorrelation coefficients of the four processes equal the 
prescribed values r t = '6000, r 2 = *0900, r 8 = — ‘1500, r 4 -= — ‘1000. 

Having now exemplified the construction of sets ( b ) belonging to 
moving averages (249) with correlograms approximating that of the 
Beveridge wheat price index, we shall postpone the discussion of 
the results arrived at until we have made a few applications of 
the inversion formula (200) and the relation (260). 

If {£(£)} is a regular moving average (249), the primary process 
{rj(f)} will be given either by (200) or by (255), the latter formula 
corresponding to the exceptional case when the characteristic equa- 
tion of the difference relations satisfied by the coefficients (a) pre- 
sents roots of modulus unity. This characteristic equation is nothing 
else than the equation b(x) = 0 used in the general method exempli- 
fied above. This method being based on the calculation of the 
roots of the equation mentioned, we are in a position to point out 
directly which of the formulae (200) and (255) to apply in the 
different examples, and to carry the analysis further on the basis 
of the general developments in section 26. 

Returning first to the approach (292), we have to apply (255), 
for the root x — — 1 of b {x) = 0 is of modulus unity. Proceeding 
as in the illustration 1 of section 26, we get 

(298) 7] {t) = lim [£ (t) — m — a (£ (t — 1) — m) + a 2 (£ (t —2) — m) — • • • ] 

l>a— *1 

By construction, the second approach (293) is also such that a root 
of b(x) = 0 is of modulus unity. 

In order to give an application of (200), we proceed to the ap- 
proach (294). According to theorem 8, the coefficients (a) will satisfy 
the difference relation a* + 10450 1 4- *2730^—2 = 0. The char- 

acteristic equation being b (x) = (x + *5225) 2 = 0, it follows from 
section 6 that ajc may be written on the form ak = (A + Tc * B) * 
•( — '5225) fc . The constants A and B may be obtained from the 
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initial values a 0 = 1, ^ = —10450 (cf. (97) and (197)). A short 

calculation gives A = B — 1, and hence a* = (1 + &)(—' 5225)*. In 

applying formula (200), we have to insert this expression, and to 

replace {£«)} by {£i(0 — m}, where m = J E[£ l (t)]. Observing that 
00 

Sat — (1’5225) 2 = *4314, we get the inversion formula 
o 

(299) rj (0 = - *4314 m -f £ x ® - 2 (*5225) £, (t — 1) 4- 

+ 3 C5225) 2 Citf — 2) — 4 C5225) 3 ^ — 3) H — 

It is seen that the approach (297) gives a formula of the same 
type, but, of course, with more complicated coefficients ( a ). 

Denoting the Beveridge index by ft, and assuming that ft is 
a sample series of the moving average {f 3 (fi} given by (294), formula 

(299) may be used for deriving a series fjt which corresponds to 
this hypothesis. Identifying m with the average m of the index 
series fjt , we get 

(300) fjt = —‘4314 m + 1-0450 5_i + ’8190 f M - ‘5706f<_3 + ••• 

The sum of the 100 index fluctuations ft given in table 7 being 
—28, we get m — —‘28. Inserting this, and using the index devia- 
tions given in col. (1), formula (300) has given the series fjt pre- 
sented in col. (2). Having put a* = 0 for h > 12, the first 12 fjr 
values are partly based on index fluctuations not given in the table. 

Apart from the constant fn = —‘28, the moving average (294) as 
performed on col. (2) must reproduce col. (1). Thus the values fjt 
obtained may be checked by the simple identity 

(301) ft — w — fjt + 1*0450 fjt— i + ‘2730 fjt— 2 . 

Because of the large number of terms in (300) and (301), there will 
sometimes be a deviation amounting to ‘2 or ‘3. 

In analogy with the above, the series fjt corresponding to the 
regular approach (292) may be obtained by a limit passage based 
on the relation (255). Having exemplified in illustration 1 of sec- 
tion 26 the limit procedure for deriving a primary series fjt, it is 
seen that in point of principle no complications will be met. Ac- 
cordingly, we shall not dwell further on the exceptional cases when 
b (x) — 0 presents a root of modulus unity. 

Starting from the hypothesis that a given time series It is a 
sample series of a regular moving average (249), we have above 
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Table 7. 

Beveridge 

wheat price 

index fluctuations (col. 1), 

and 




hypothetical 

primary 

series fjt 

(col. 2). 




Year 

( 1 ) 

( 2 ) 

Year 

( l ) 

( 2 ) 

Year 

( 1 ) 

( 2 ) 

Year 

( 1 ) 

( 2 ) 

1770 

31 

23*9 

1796 

30 

17*6 

1820 

-16 

*8 

1845 

15 

19*6 

71 

36 

9*3 

96 

— 6 

- 27*2 

21 

-24 

- 19*3 

46 

39 

19*9 

72 

19 

3*0 

97 

-16 

7*8 

22 

-23 

- 2*8 

47 

-10 

- 35*9 

73 

6 

*6 

98 

-13 

- 13*5 

23 

-29 

- 20*5 

48 

-20 

12*3 

74 

5 

3*9 

99 

20 

32*2 

24 

-29 

- 6*4 

49 

-26 

- 28*8 

75 

—12 

- 16*9 

1800 

39 

9*3 

25 

-31 

- 18*3 

1850 

-22 

6*0 

76 

-16 

- *2 

01 

17 

- 1*2 

26 

-18 

3*2 

51 

-14 

- 11*1 

77 

- 6 

- 1*2 

02 

6 

40 

27 

- 7 

- 5*1 

62 

5 

15*5 

-a 

00 

-13 

- 11*4 

03 

- 6 

- 9*6 

28 

14 

18*7 

53 

38 

25*1 

79 

-21 

- 8*5 

04 

26 

34*2 

29 

3 

- 14.5 

64 

41 

10*8 

1780 

-13 

- *7 

05 

14 

- 18*9 

1830 

10 

20*7 

55 

38 

20*1 

81 

-12 

- 8*6 

06 

- 2 

8*6 

31 

5 

- 12*3 

56 

7 

— 16*7 

82 

- 6 

3*6 

07 

- 7 

- 10*6 

32 

-18 

- 10*6 

67 

-18 

- 5*8 

83 

- 6 

- 6*9 

08 

- 6 

3*1 

33 

-20 

- 5*4 

68 

-19 

- 8*1 

84 

- 8 

- 1*3 

09 

— 6 

- 6*0 

34 

-22 

- 13*3 

69 

- 3 

7.4 

86 

-16 

- 11*4 

1810 

4 

9*8 

35 

-18 

- 2*5 

1860 

16 

10*7 

86 

-16 

- 3*4 

11 

40 

31*8 

36 

-12 

- 5*6 

61 

7 

— 6*0 

87 

- 7 

*0 

12 

21 

- 14*6 

37 

2 

8*7 

62 

- 8 

— 4*4 

88 

8 

9*2 

13 

-4 

2*9 

38 

17 

9*6 

63 

-21 

— 14*6 

89 

8 

- 1*4 

14 

- 4 

- 2*7 

39 

7 

- 5*2 

64 

-19 

— 2*4 

1790 

-14 

—3 4*8 

15 

30 

32*3 

1840 

- 5 

- 1*9 

66 

- 6 

*7 

91 

-22 

- 6*9 

16 

78 

45*2 

41 

1 

4*8 

66 

19 

19.3 

92 

-13 

- 2*6 

17 

26 

- 29*7 

42 

- 8 

- 12*1 

67 

18 

- 1*9 

93 

-16 

- 10*6 

18 

— 6 

13*1 

43 

-12 

- *3 

68 

- 7 

- 9*8 

94 

3 

14*9 

19 

-14 

- 19*3 

44 

- 8 

- 4*1 

69 

2 

13*1 


exemplified a general method for deriving the corresponding primary 
series fjt . We are now in a position to solve the analogous prob- 
lem under the hypothesis that is a sample series of a non-regular 
average (cf. p. 129). After the detailed treatment of the regular 
case, it will be sufficient to deal with the non-regular averages very 
briefly. 

Let the characteristic equation of a non-regular moving average 
{£(£)} be given by (cf. (258)) 
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Ux) =~ f] (,x h + l^x h ~ i + ••• + b^x + 5' f !) = 

K 

= J£V ) & P + A^ 1 + ■ • + Ap-ix + AjMptfl + + • • 4- 

4* B q — i X + Bq ) = 0, 

where p 4- q = h, and where the roots of x v 4- A t x p - ” 1 4* • • 4- A p = 0 
are lying inside, and those of x q 4- B x x 9 ~ 1 + • • 4- B q = 0 outside 
the unit circle. 

The coefficients At and Bt being real, two real sequences pt and 
qi will be defined by the systems (cf. (97)) 

jp n 4- A 1 p n — i 4- • • ■ 4- An—i Pi 4- An == 0, n = 1, 2, . . p — 1; 
l#n 4“ iljPn— 1 4- ■ • • 4- l p + 1 4- p =0, W ^ p\ 

(B q q n 4- i ^i—i 4- ■ • ■ 4- 75(2— n+ 1 4- n = 0, w == 1, 2, . q — 1; 

\ Bq q<n 4" Bq— i g n — i 4- • ■ • 4- B x q n —q+ 1 4- g n _g =0, n 

J ? o == 2o == l* It is seen that the sum 

a(©«£(0 + l>iC(f- 1) + aC»-2) + -- 

will be convergent. Paying regard to certain evident relations be- 
tween the coefficients JU, B*, and &M, it follows further 

(302) a (© - JC • [q (fi 4- B 4 17 (t - 1) 4 • ■ ■ 4- B q tj (t - q)]/K®. 

Thus prepared, let us form the sum 

($(t) = cf (ft 4- Qi(x(t 4- 1) 4- a (t 4- 2) 4“ • * • • 

Observing that the roots of the equation 


Bq x q 4- Bq — x x q 1 4- • • • 4 B 1 x 4" 1=0 


are of modulus less than unity, it follows without difficulty that 
this sum also is convergent, and that ft (t) = B q • K -rj (f — q)/K {i) . 
Since B q + 0, we conclude that, apart from a constant factor, the 
repeated linear operation performed will yield {17 (ft} in terms of the 
non-regular moving average considered. 

Comprehending the double transformation, we get 

} ?{i) *> 

(303) {rj (0} = =— -j, ■ S c„ • {£(* + q + »)}, 

-LJq * ii. n — — oo 


11 - 535697. H. Wold. 
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where c 0 = 1 + PiQt + jp* ? a + — , and 

f Cn = Pn +i?n+l2i + Pn+iQi H , W > 0, 

\ C-n = g n + Qn+lPi + Qn+tPz + **'*> W > 0. 

The generalization to the case when b (x) = 0 presents roots on 
the periphery of the unit circle is, of course, straightforward. 

In order to give an explicit application of the relation (303), let 
ns consider the group (£) belonging to the regular average (294) 
studied in detail before. As to (295), we get by analogy from (299), 
and in full agreement with (303), 

(304) rj (t - 2) + *4314 m = £*(*)- 2 (5225)* £, « + 1) + 

+ 3 05225) 8 £* (t + 2) 

Considering the remaining average (296) in the group, the coefficients 
appearing in (303) are seen to reduce to p n =■ q n = (— *5225) n . Hence 
c n =i? |n l/(l — p 2 ), ^=—‘5225, w = 0. Since K/K {i) is nothing else 
than the factor of rj(t) in (296), and B q = — 1/ 5225, we get 

(305) 17a — 1)= 2 [C 8 ff + w)-m]-pl»l/(l-p 2 ), p = —'5225. 

n = — qo 

By means of the formulae (304) and (305), it is possible to derive 
explicitly the two series fjt corresponding to the hypotheses that 
the Beveridge index is a moving average (295) or (296) respectively. 
The calculations running as in the regular case, no detailed illustra- 
tion will be given. Of course, the last fjt values can be calculated 
only approximately. It should be observed that a complete check 
may be based on identities similar to (301). 

If a series fjt corresponding to a certain process in a group (£) 
has been derived, it will sometimes be possible to arrive at the primary 
series, say fjt, corresponding to another average in the group by 
means of a simpler procedure than that based on (303). For in- 
stance, representing by fjt and fjt the primary series corresponding 
to (294) and (296) respectively, the relation (260) gives 

— fjt = P fjt — (1 — P 2 ) fjt- 1 ~p( 1 —p*) fjt - 2 — P 2 ( 1 —p 2 ) fjt-Z 

where p = —*5225, and hence p fjt ~ fjt—i ~ p fjt-i — fjt . The series 
pfjt—i — fju say £?, being readily derived from the series fjt in table 7, 
we get the simple relation (cf. (305)) 
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Vt—l f? + P f?+l — p 2 ft*+ 2 + P* f?+ 3 

Having now given an account of some applications of the general 
theory of moving averages to the Beveridge wheat price index, we 
shall next attach some remarks of general scope to certain points 
in the analysis. Let us in the first place touch upon the problem 
of testing the results derived under a hypothesis of moving 
averages. 

In a process {f (£)} of moving averages as given by (249), a char- 
acteristic feature is that f(© is independent of £(t—h— -k) for A>0. 



Fig. 8. Beveridge wheat price index , ft. The scatters (ft, ft-i) ( left\ and 

(&» 2 ) (right). 


Thus, (249) will form no adequate basis for the analysis of a time 
series ft unless the scatters (f,, ft-a-i), (ft, £*_*_*), etc. approximate 
distributions of independent variables. On this line it would, of 
course, be possible to develop different types of tests. — Figure 8 
contains the scatters (ft, ft— 1 ) and (ft, ft— 2 ) belonging to the Beve- 
ridge data given in table 7. Figure 8 (left) clearly shows that ft and 
ft- 1 must be considered interdependent. On the other hand, the 
scatter (ft, ft— 2) seems to permit us to look upon ft and ft— 2 as inde- 
pendent, a circumstance speaking in favour of an average of the 
simple type ft = fjt + bfjt — x . 

In starting from a specified scheme of moving averages, we are 
in a position to derive a hypothetical value for any characteristic 
of the series ft and of the corresponding series fjt calculated as 
indicated above. In point of principle, every such characteristic as 
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compared with the corresponding empirical one will give a basis for 
testing the hypothetical set-up. 

Of course, when a scheme with correlogram approximating the 
empirical one is chosen, there will automatically be an agreement 
in the main between certain hypothetical and empirical character- 
istics. Tor instance, the serial coefficients of fjt will approximate 
zero, since the deviations from this hypothetical value will be due 
only to the differences between the hypothetical and the empirical 
correlogram. The situation is the same with regard to the ratio 
between the variances of fjt and £*. Considering e. g. the approach 
(294), the hypothetical value of this quotient is 1/(1+ 1*0450* + 
*2730*) = *462. On the other hand, the variances of the series fjt 
and £* being 213*7 and 383*6 respectively, the empirical quotient 
equals *557. 

By construction, the moving averages in a group (£) will present 
the same correlogram and the same variance. We conclude from 
the above that in point of principle the autocorrelation properties 
of the corresponding series fjt will give us no criterion for distin- 
guishing which of the processes should be preferred. Expressing 
the argument in other words, the deviations in fk (fjt) and jbifji) from 
the hypothetical values will depend solely on the differences between 
the hypothetical and empirical correlograms of and these differ- 
ences are exactly equal for all processes in the group (£). 

The forecasts based on the hypothesis of moving averages disclose 
an interesting aspect of the test problem. Denoting by jF* [£(£+&)] 
the forecast over k time units based on the development up to the 
time point t , the general formula (213) gives in the approach (294) 
the following forecasts 

(306) F t m + 1)] = m + 1*0450 fjt + *2730 fj ^ ; 

#[£(* + 2)] = m + *2730 fjt; 

and Ft[£{t + Jc)] = m for Tc > 2. In particular, taking t = 1811, 
table 7 shows that fjt = 31*8, fjt-i = 9*8. Keeping in mind that 
-*28, we get #[£(*+ 1)] = 35*6, *!£(* + 2)] = 2*4, while the 
actual path of the index runs through = 21, £«+a= —4. 

Considering on the other hand the approach (295), which belongs 
to the same group as (294), the forecast formulae corresponding to 
(306) would read F t [£(£+ 1 )] = w + 1.0450 9* + 9/-r, #[£(£ + 2)] = 
= w + fjt. However, in this case we cannot derive fjt and fjt - 1 
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from the observed values £*_i, 2 , . . . Now, the autoregression 

analysis as developed in Chapter II yields a linear forecast which 
is valid for any stationary process with finite dispersion, and for 
which the squared deviation from the future path is of minimum 
expectation. Writing {rj^ (f)} for the residual process of the station- 
ary process considered, and paying regard to the results arrived 
at in section 26, this forecast will in the present case reduce to 

(307) Ft [£ {t + k)\ = Ik rj ^ + bk+i H + bh yl—h+v 

where the sequence (6) is identical with the coefficients in the 
corresponding regular average. According to the general theory, 
is obtained simply by subjecting {£(£)} to the same linear 
operation which gives the primary process {rj (0} in terms of the 
regular average. 

In taking the squared deviation as the measure of the efficiency 
of a forecast, we conclude from the above that the different hypo- 
thetical averages in a group © will give rise to exactly the same 
forecast series Ft [£ (t 4* 1)], Ft [£ (t + 2)], etc. A simple illustration 
of this is given by the group (294) — (296). Taking e.g. the process 
(295) for a hypothetical basis, we have in the first place to form 
the series fj®. According to the general analysis, this is identical 
to col. (2) in table 7. Applying next the general relation (307), 
and keeping in mind that the coefficients (6) coincide with those 
in the regular average (294), it follows that the resulting forecasts 
will equal those previously obtained on the basis of the regular 
average (294). 

Expressing the situation in other words, we have found that the 
different averages in a group © are equivalent in view of those 
aspects of the general test problem hitherto considered. Since the 
indeterminateness is due to our having dealt with only the auto- 
correlation properties of the hypothetical models, we have to use 
other types of methods for distinguishing between the different 
averages in a group. Generally speaking, these tests should ex- 
amine to what degree the elements fjt and fjt+u resulting from a 
special hypothesis might be considered not only uncorrelated but 
also independent. The different averages in a group giving rise to 
different primary series fjt , we must therefore compare in detail the 
multi-dimensional scatters {% fjt— 1 , • • f]t—h)- For the sake of con- 

creteness, the scatter {fjt, fjt— 1 ) obtained from col. (2) in table 7 is 
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shown in fig. 9. To my eye, the scatter forms no very good approxima- 
tion to a distribution of independent variables. In performing different 
tests of this nature, it may of course occur that no series fjt will 
give satisfactory results — perhaps one of the scatters examined 
will suggest instead a non-linear scheme, e.g. a moving average 
performed on the non-autocorrelated process {£(<)} defined by (286). 
However, it would be beyond the frame of this study to enter 
upon further details concerning the nonlinear schemes and the 
non-linear methods required for distinguishing between the different 
moving averages belonging to the same group. 

As repeatedly emphasized in earlier 
sections, the applications accounted for 
in the present study do not aim at 
reaching definitive, quantitative results. 
Accordingly, no attempts will be made 
to test the significance of the parameters 
arrived at under the different hypoth- 
eses dealt with. Hence it is out of 
the question to draw quantitative com- 
parisons with earlier investigations of 

the wheat price data analysed. As is 
Fig. 9. Beveridge wheat price , , g w B ( 1922 \ 

index. The scatter belong- ^ elL ™own, &IR W. Beveridge 

ing to a hypothetical primary se - ^as subjected his index to an extensive 
Hes rj t . periodogram analysis, while G. U. Yule 

(1926) has illustrated certain new correlo- 
gram methods by the use of the Beveridge index numbers. However, 
it will be illuminating to take up some points of these investiga- 
tions for a qualitative comparison with the moving average approach. 

In his presentation of the wheat price data, Sib W. Beveridge 
(1921) gives two series of index numbers, the trend present in the 
first series being removed in the second. In contradistinction to 
Beveridge and to the present writer, G. U. Yule (1926) works 
on the first series. In his search for hidden periodicities, Yule 
modifies the correlogram method because of the trend present in 
the original data. Since Yule’s approach takes into consideration 
the differences of the series analysed, it might be interpreted as 
an application of certain non-stationary or evolutive processes of 
the homogeneous type {£(£)} defined by (192). The investigation 
thus following a quite different line of research, no further comment 
is called for iu the present connexion. 
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The periodogram method being based on the assumption of 
hidden periodicities, its use will result in a hypothetical scheme 
additively built up by a functional element y{t) consisting of a 
number of superposed harmonics, and a random element or »error» 
ijOQ. Since a single harmonic involves three parameters, the total 
number of parameters in a scheme of hidden periodicities amounts 
to thrice the number of harmonics superposed. Specifying the 
values for these parameters, we can compute the functional element 
Deducting y(t) from the original data, say £#, we obtain the 
corresponding path of the random component, — — y ( t ). 

A common standard measure of the efficiency of the periodogram 
analysis is obtained by dividing the variance of the errors fjt by 
the variance of the original data £<• This quotient, say x 2 , is al- 
ways below unity, and the closer to zero the quotient, the more 
the functional element y ( t ) will » explain » of the series under 
analysis. For instance, the harmonic corresponding to the largest 
ordinate in the periodogram 1545 — 1844 given by Sir W. Beveridge 
((1922), p.438) will leave an » error* fjt with variance amounting to 
91 % of the variance of the wheat price index. The harmonic in 
question is of period jp~l5’225. 

On the other hand, a scheme of moving averages (249) is built 
up by means of a random variable rj (£), here called primary, and 
a set of coefficients or parameters ( b ). After having chosen numer- 
ical values for the parameters (6), the general analysis in the present 
section provides a method for deriving from the original series £* 
the corresponding path fjt of the primary variable. 

Even in the case of moving averages, the quotient x 2 between 
the variances of fjt and £* may be taken as a standard measure of 
the efficiency of the analysis. The hypothetical value for x 2 being 

in this case x 2 = 1/(1 + h\ + h b \ ), it is seen that the larger the 

coefficients (6), the less is x 2 , and the better the result of the 
analysis. For instance, considering the ^-values in table 7 derived 
from the approach (294), we have, as already mentioned, x 2 = *557, 
x 2 = *462. The small values obtained seem rather satisfactory, but 
it must be remembered that the analysis has been restricted to 
the last 100 Beveridge data 1770 — 1869. Thus, although the 
approach (294) involves only two parameters, it might perhaps be 
unfair to the scheme of hidden periodicities to compare directly 
with the x 2 -value ’91 derived from the whole series under the 
hypothesis of one harmonic component. 
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An examination of the forecasts delivered will show clearly the 
thorough-going difference between the scheme of hidden periodicities 
and the scheme of moving averages. 

In a scheme of hidden periodicities of type (39), the forecast 
curve is identical to the functional element y(f), i.e. the sum of 
harmonics superposed (cf. p. 59). The hypothetical model thus 
will provide a definite functional forecast over the infinite future. 
The expected squared deviation between the actual development 
and the forecast is independent of the period forecasted over, and 
amounts to D 2 (rj). 

A scheme of moving averages gives a quite different type of 
forecast. In fact, considering a moving average (249) ranging over 
h + 1 time units, it is only the forecasts over the next h observa- 
tions that are effective — the forecasts beyond h time units are 
trivial, and reduce to the average of the original data. According 
to the general analysis, the different averages in a group (£) will 
give rise to the same forecasts, viz. 

Ft [£G£ + Jc)] = ih + bt fjt + 6jt+i f}t — i H t* ih fjt-h+k , 

where the coefficients (6) are those appearing in the regular average 
used as the basis for deriving the primary series fjt. The hypo- 
thetical value for the squared deviation from the actual development 
being (cf. (218)) 

(1 + + • ■ • + Ih-i) • -D 2 (rj\ 

it is seen that the efficiency of the prognosis will decrease gradually 
as the period forecasted over is extended. 

Especially in view of economic time series, the type of forecast 
delivered by the scheme of moving averages seems a priori more 
realistic, seems to correspond better to what might be reasonably 
possible to find out from the past development. Further, considering 
the forecasts over a short period, the prognosis given by the scheme 
of moving averages is, as a rule, rather efficient. In my opinion, 
this is a circumstance of central importance, for often the main 
interest is concentrated upon the prognosis concerning the near 
future. 

As to the harmonic components suggested by a periodogram 
analysis, these cannot always be interpreted in the light of what is 
otherwise known of the phenomena under investigation. Lacking 
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other evidence, the periodicities thus will stand out as quite isolated 
results of the analysis. As pointed out by Sir W. Beveridge 
((1922), p. 438), this is the case with the period of length 15225 
years suggested in his periodogram mentioned above. — Against 
this background, the scheme of moving averages seems more fertile. 
Let us dwell a moment on this point. 

In modern economic-statistical research work, a prominent line 
of approach is the regression analysis of time series (cf. e.g. C. JF. 
Boos (1934)). Denoting a set of time series by £«, fjt, fjt\ . ., 
a simple type of approach reads 

(308) Ct = C L Vt + tft + * • ’ + On fj^ + €«, 

where e t stands for the residual in the representation of as 
linearly correlated with the n variables fj®, A generalized approach 
is obtained by replacing by rf* ^ , where the constant fa represents 
the lag of the series rjf* behind the series The well-known con- 
cept of distributed lag implies a further generalization, which in the 
simplest case n = 1 leads to an approach of type 

(309) £* = b 0 fjt + b x fjt— i -t h bh fjt — h + fit. 

For instance, as an initial approximation we might represent a 
wheat price (i.e. Q in the year t as linearly correlated with wheat 
crops (i.e. fj) in the years t, t — 1, . . ., t — h. According to the 
theory of supply and demand, we might expect that in this case 
the dominant regression coefficients bi would be negative. 

Disregarding the residual e tl and interpreting in the terminology 
of the present study, the series £* as given by (309) is seen to be 
nothing else than a moving average performed on the primary 
series fjt. Consequently, a hypothetical model corresponding to the 
approach (309) will be obtained by adding two independent processes, 
viz. a moving average {6 0 rj (£) 4- rj(t — 1) + • ■ • + bh rj(t — A)} and 
a residual process {$(£)}. 

Having seen that the concept of moving average may be attached 
directly to the concept of distributed lag, the theory of moving 
averages as developed in the present study seems to disclose new 
aspects of lag problems, and to suggest methods for a deeper ana- 
lysis in this field of research. A few remarks on this line will 
follow. 
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The idea underlying the general methods applied in the present 
section is that the autocorrelation properties of a time series £* will 
reveal whether the process of moving averages is an adequate type 
of hypothetical model for The circumstances of importance in 
this connexion are (A) that in such au analysis no other time series 
are taken into consideration, and (B) that a specified hypothesis of 
moving averages (it should be observed that the different averages 
in a group (£) must be examined separately!) will determine a 
hypothetical primary series fj «. Accordingly, independently of the 
first stage of the analysis we can compare such a hypothetical 
series fjt with other time series, say fjt\ etc., thought of as 
possibly affecting the series £* examined. For instance, in ap- 
plying the approach (294) to the Beveridge wheat price index, we 
have derived the hypothetical series fjt given in col. (2) of table 7. 
This series has been calculated merely in order to illustrate in 
detail how the general inversion formula (200) works when applied 
to an observational time series, but in case the parameters (6) were 
significant, the series fjt might be compared with e.g. some appro- 
priate wheat crop series fjt . Following the suggestion made in 
connexion with the approach (309), we might in such a case change 
simultaneously the signs in the coefficients (6) and the series fjt. 

In view of the lines of research suggested above, our theoretical 
analysis of moving averages calls for generalizations in various 
directions. Having assumed the primary process {rj(f)} in (249) to 
be purely random, it would in the first place be of interest to 
generalize the concept of moving average by removing the restric- 
tions imposed on the primary process {rj(f)}. Now, assuming only 
that {17M} is stationary and of finite dispersion, the autocorrelation 
coefficients of {£©} will exist, and be obtained by a straightfor- 
ward generalization of the relations (250). Considering two time 
series and fjt with correlograms f*(£) and ?k(fjt) respectively, the 
generalized relations evidently may be used for finding out approxim- 
ately how a moving average with specified coefficients ( b ) as per- 
formed on fjt would transform If the transformed correlogram 

approximates P*(£), we are led to investigate in detail whether £* 
approximates a moving average with these coefficients ( b ), and with 
fjt for primary series. Concluding from theorem 9 that the repre- 
sentations (200) and (303) hold even for the generalized average, 
this investigation can be performed as suggested in the case of a 
purely random {17 (£)}, viz. by deriving directly from £* the primary series 
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fjt corresponding to the coefficients (6), and then correlating or 
comparing in another way the two series fjt and r[ t . 

Finally, another generalization is introduced when several series 
fjt, fjt , . • . are employed for building up the series £« (cf. (308)). 
Since such an approach falls under the theory of multi-dimensional 
stochastical processes, a discussion of this case would be out of 
place in the present study. 

In examining the general methods applied to the Beveridge 
wheat price index, we have laid stress upon the different type of 
forecast delivered by the scheme of moving averages as compared 
with the scheme of hidden periodicities. The approach of moving 
averages seems particularly useful in forecasting over a short period 
of time, and attaches directly to current forecast methods, in par- 
ticular the approach of distributed lag. Having in the previous 
analysis referred throughout to economic-statistical applications, 
the present section will be concluded by a few remarks concerning 
the applicability of the scheme of moving averages in other fields 
of scientific research. 

In periodogram analysis of geophysical data — e.g. records of 
rainfall, water-levels, temperature, terrestrial magnetism, etc. — 
it is often difficult to interpret the periods suggested as physical 
realities. The situation being the same as in economic applications, 
the claiming of such periodicities has been subjected to severe 
criticism. A fact especially stressed — see e.g. an excellent critical 
survey by D. Brunt (1937) — is that these periodicities can ex- 
plain but a small, often quite insignificant fraction of the variance 
in the observational data. In view of this, the lines of research 
based on multi-dimensional regression analysis seem more promising 
(see e.g. C. W. B. Norm and (1932)). Aiming at short time forecasts, 
the realistic hypothesis underlying these investigations is that the 
phenomenon considered is causally connected with other phenomena 
by relations involving distributed lags. The theoretical set-up 
required having been touched upon in our discussion of economic 
applications, it is seen that the methods suggested by the theory of 
moving averages might be used also in these fields of research. 

For instance, representing the water-level in a lake by a moving 
average of the rainfall in surrounding districts, and following the 
method outlined, we are led to compare the water-level correlogram 
with the rainfall correlogram as transformed by the hypothetical 
moving average. By the courtesy of my friend B. Bruno, who 
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has taken an interest in my studies in time series analysis, I am 
in a position to give an illustration of this on the basis of correlo- 
grams appearing in a forthcoming paper (B. Brtjno (1938)). 

Starting from quarterly observations 1807 — 1936 of the level of 
Lake V drier, 130 yearly data £* were obtained by simple averaging. 
Using formula (13), Bruno has derived a set of serial coefficients 
?*:(£) for each of the periods 1807 — 1936 and 1871 — 1930. The 
resulting correlograms are shown in fig. 10. 

If a strict periodicity were present in the material, the two 
correlograms #>(£) should rise simultaneously to a maximum with 



Fig. 10. Correlogram of the level of Lake Vaner 1807 — 1936 ( thick line), and 

1871 — 1930) (broken line). Transformed rainfall correlogram (small rings). 

abscissa equalling the length of the period. However, the correlo- 
grams fluctuating rather independently, no period is suggested in 
this way. Anyhow, in view of the smallness of the serial coefficients 
fjb(£) obtained for k > 1, a hypothetical period cannot be expected to 
explain a significant part of the variance in the series £* under analysis. 

To my eye, the correlograms suggest more definitely a scheme of 
moving averages of the simple type (£(f)} = 6 0 +&j{ij(£— 1)}, 

where {rj(t)} is purely random. In fact, only the coefficient is rather 
large in both the correlograms, while — as should be the case if the 
remaining serial coefficients were chance products — those based 
on the longer period of observation are, on the whole, lying closer 
to zero. Taking ?*!(£)= ’4 for the hypothetical value of f x {Q, we 
have seen in illustration 2 in section 26 (see. p. 131) that the 
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corresponding group (£) of averages is constituted by the two 
processes + 'b{r)(t — 1)} and *5{^(fi} 4- 1)}. 

B. Bruno has further constructed the correlogram of a series 
fjt obtained by averaging the yearly rainfall 1867 — 1936 in four 
cities in or near the drainage-basin of Lake Vaner, viz. Falun, 
Karlstad, Yanersborg, and Oslo. The correlogram obtained being 
shown in fig. 11, it is seen that the deviations from zero of 
are rather irregular and small. 

Assuming as a first approach the serial coefficients FjbOj') to be 
insignificant, we obtain a closed hypothetical model of the two 



Fig. 11. Correlogram of the rainfall 1867—1986 in the drainage basin of Lake 
Vaner (thick line), and the same correlogram transformed by formula (810) (small rings). 


series £* and fft by regarding the rainfall series fj't as belonging to 
a purely random process {?/(£)}, and the water-level series £* as 
belonging to one of the averages in the group constituted by 
{rj'it)} 4 ’5 {77' — 1)} and 4 1)}. 

As suggested on p. 170, this simple model might be generalized 
by cancelling the assumption that {rj f (t)\ be purely random. Letting 
in such a case the autocorrelation coefficients of {rj r (t)} be represented 
by a short calculation shows that those of {£'(£)} = b 0 {?/(£)} 4 

4 bi{r[ {t— 1)} will be given by 


(310) 


r*(£')= 


(hi 4 b]) • n + fy) h fa+i ( 1 /) + r k-\ ( rj )] 
hi 4 bl 4 2&oVr 1 (i/) 


(cf. a related formula given by G. U. Yule (1926), Appendix IT). 
Replacing r k (17') by this formula gives with good approximation 



174 


IK ALT BIS OF STATIONARY TIME SERIES 


UV 32 


the correlogram of the series b 0 ifc + &i y't-i- Putting J 0 = and 
b t = ' 6, we have in this manner obtained the transformed correlo- 
gram indicated by small rings in fig. 11* For direct comparison 
with the water-level correlograras, the coefficients f* (if) as transformed 
by (310) have also been plotted in fig. 10. The parallelism with 
the correlogram based on the period 1871 — 1930 is rather encourag- 
ing — in 10 cases out of 13 the rings and the coefficients f*(£) 
vary in the same direction. Observing that the transformation 
(310) is symmetrical in respect of the coefficients (6), we are thus 
led to examine in detail to what extent the water-level £< may be 
approximated by a moving average of the rainfall fjt of the simple 
form fjt + * 6 fjt— i or ’ 6 fjt + i}t— i. 


32. Some applications of the scheme of linear autoregression. 

Having in Chapter III investigated the process of linear autore- 
gression on the basis of a theory of stochastical difference equations, 
we shall in the present section give a few applications of this 
scheme. In doing this, we shall proceed as in the previous section. 
Choosing an economic time series as our experimental object, we 
shall first illustrate in detail a general method for determining the 
parameters when applying a scheme of linear autoregression. The 
modest purpose being to show how the method works, the tests 
touched upon in the following discussion will not be applied for 
examining the significance of the parameters arrived at. In discus- 
sing the results, the analysis will instead be focussed on a qualita- 
tive comparison with other hypothetical schemes, and with certain 
related lines of economic-statistical research work. Following up 
the parallelism with the applications of the scheme of moving 
averages, this section will be concluded by referring to a few other 
fields of scientific research which invite an application of the scheme 
of linear autoregression. 

The time series dealt with in the experiments accounted for in 
the following is the Swedish cost of living index 1830 — 1913 


* Similarly, (310) gives the serial coefficients of the series (jffc) extracted in table 2 
in terms of f * (aj a) ) as given on p. 60. 
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compiled by G. Myrdal ((1933), Table A, budget b). Since the 
index presents a marked trend, this had to be removed before 
starting the analysis (cf. p. 146). Having used the 21 -term formula 
of J. Spencer for this purpose (see e. g. E. T. Whittaker and 
G. Robinson (1926), p. 290 f), the deviations from the graduated 
index are shown in fig. 12 (thick line). The graduation being 
disturbed by a rapid rise of the index in the years 1851 — 57, the 
values obtained for this period were subjected to a slight adjust- 
ment (broken line). Another graduation by hand was used for the 
years 1904 — 13 not covered by the Spencer formula (broken line). 
In order to avoid decimals, the resulting 84 deviations 1840 — 1913 
were multiplied by 10. These index fluctuations, say constitute 
our experimental series, and are shown in fig. 12. The numerical 
values of £* are given in col. (1) of table 8. 


Table 8 . Fluctuations in the G. Myrdal cost of living index (col. 1), 
and hypothetical primary series f]t (col 2). 


Year 

( 1 ) 

( 2 ) 

Year 

( 1 ) 

( 2 ) 

Year 

( 1 ) 

( 2 ) 

Year 

( 1 ) 

( 2 ) 

1840 

0 


1860 ■ 

-33 

19*7 

1880 ■ 

-14 

36*9 

1900 

34 

6*5 

41 

16 

* 

61 

21 

12.6 

81 

32 

2*1 

01 

- 5 

- 9*1 

42 

81 

* 

62 

58 

20*4 

82 

24 

- 5*9 

02 

- 9 

18*8 

43 

- 6 

• 

63 

8 ■ 

- 42*4 

83 

40 

23*3 

03 

- 2 

11*2 

44 

-66 ■ 

- 29*6 

64 

-35 

- 11*4 

84 

24 

3.7 

04 

-30 

- 21*2 

45 

-21 

21*8 

65 

-39 

- 5*2 

85 

- 3 

14*5 

05 

-21 

3*6 

46 

16 

69 

66 

-13 

12*2 

86 

-35 

- 11*9 

06 

-14 

- 17*6 

47 

35 

16*9 

67 

48 

86*5 

87 

-60 

- 23*6 

07 

29 

31*9 

48 

9 

- 18*0 

68 

71 

26*6 

88 

-26 

7*1 

08 

33 

- 2*2 

49 

-13 

- 1*6 

69 

- 1 

- 33*8 

89 

12 

- 1*9 

09 

8 

*6 

1850 

-36 

-170 

1870 

-59 

- 16*7 

1890 

29 

2*7 

1910 

- 5 

6*1 

61 

-50 

-190 

71 

-54 

- 30 

91 

56 

30*4 

11 

-34 

- 17*8 

52 

-58 

- 36*4 

72 

-36 

- 10*5 

92 

39 

8*8 

12 

31 

65*4 

53 

-64 

-490 

73 

25 

268 

93 

2 

12*2 

13 

24 

- 17*4 

54 

— 48 

- 38*6 

74 

48 

- 3*0 

94 

-42 

- 13*8 


-198 

— 47*6 

55 

-11 

- 21*1 

76 

30 

1*6 

95 

-26 

21*9 




56 

68 

38*5 

76 

32 

30*3 

96 

-38 

- 32*4 




57 

66 

- 8*7 

77 

31 

30*2 

97 

-19 

1*9 




58 

-48 

- 62*8 

78 

-33 

— 22*6 

98 

11 

- 6*0 




59 

—92 

- 19*3 

79 

-84 

- 30*0 

99 

89 

17*9 
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Our series £« is seen to reflect clearly the changes between 
economic expansion and contraction. A certain regularity seems to 
be present in the movement up and down, but the distance between 
two adjacent maxima is rather inconstant, varying between some 5 
and 10 years. The structure of the fluctuations is summed up in 
the correlogram obtained from formula (13). This is shown in fig. 14, 
and the numerical values are given in col. (1) of table 9. 

The correlogram looks rather like a simple damped oscillation, say 
C • q k ■ cos U k + y). An inspection of the graph shows that in 
approximating the correlogram by such a function we would have 
to take the period p = to be about 7 or 8 years, the phase 
<p to be approximately vanishing, and <? 7 ~ 1 / 2 , the latter relation 
corresponding to a damping of some 50 % in the duration of one 
period. 

According to the theoretical developments in section 25, a process 
{£«)} of linear autoregression as defined by a relation of type 

(311) £(0 + a&t- 1) + a i £(t — 2) = r](t) 

will present a correlogram forming a simple damped harmonic. On 
the other hand, in a scheme of hidden periodicities, each of the 
harmonic components will give rise to an undamped harmonic in 
the correlogram. We conclude that such a correlogram cannot 
approximate the graph of serial cofficients unless at least two 
harmonics are superposed. However, such a scheme would involve 
6 or more parameters, while only two are required in (311). In 
seeking for a simple hypothetical scheme with correlogram approxi- 
mating fk as shown in fig. 14, we are thus led to try first a process 
of linear autoregression, and firstly one of the simple type defined 
by (311). 

In a general process (219) of linear autoregression, the linear 
system (222) with coefficients a x . ah will deliver the autocor- 
relation coefficients r u . ., required for deriving the following 
coefficients n , n+i, etc. from the difference relations (221). In 
searching for an adequate scheme (311), we are confronted with 
the inverse problem, viz. to find a set of coefficients a,, . ., ah 
giving rise to a hypothetical process (311) with correlogram 
approximating a prescribed, empirical correlogram. Now, observing 
that the relations (221) — (223) are linear also in respect of the 
coefficients (n), a convenient starting point for attacking the problem 
before us would be to replace the coefficients r* in (222) and the 

12-536697. H. Wold. 
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last relation (221) — a system whose determinant, w—1), is #0 — 
by the prescribed values and to solve instead for the coefficients (a). 
The system yielding our trial set (a) thus will read 

f i + a x -t- a 2 Fj + a 8 f 2 + — I- dh?h — i = 0 
f a + f i "H 0* + % + • • * + an f a— a = 0 

rh + di fft— i + fft — 2 -I- a s fft — 3 + 1- a* — 0. 

If the roots of the equation (34) formed by the resulting coefficients 
(a) are lying in the unit circle, these coefficients will define a pro- 
cess (219) of linear autoregression. By construction, the autocorrela- 
tion coefficients r u . rn of this process will coincide with the pre- 
scribed values. The following coefficients r*+i, ra+a, etc. will be 
obtained from the difference relations (221) formed by the hypo- 
thetical coefficients ( 0 ). 

Having derived the correlogram r* of the hypothetical process, 
we are in a position to compare it with the empirical correlogram f*. 
If the fit seems satisfactory, we may carry the analysis further on 
the basis of the coefficients (a) found, but if the deviations seem 
too large, an adjustment in the coefficients (a) is called for. 

In analogy with the case of moving averages, we obtain accord- 
ing to (219) a primary series 

(313) f) t = & — fh ■+• a t (&_i — m) H 1 - a h (£*_* — m) 

corresponding to our set (a). In other words, the 'complete hypo- 
thetical model will consist of h parameters (a) and the primary 
series fjt ; using these quantities, we can reconstruct the original 
series In this case, too, we may look upon the quotient x* = 
= EiJ?/2Ct as a measure of the efficiency of the analysis — the 
closer to zero our x*, the less important is the ’unexplained* random 
component, and the greater the efficiency of the approach. 

As mentioned in section 29, the method of G. U. Yule (1927) 
for determining the coefficients ( a ) in the approach (313) is to min- 
imize Ei??. The sum being constant, this method will make 
x* a minimum, and inay thus be said to be of maximum efficiency. 
Having stated this, let it be pointed out explicitly that the Yule 
method gives a system for determining the coefficients (a), which 
is equivalent to the system (312) started from above. In fact, mi- 
nimizing Si}? as defined by (313), we obtain a set of normal equa- 
tions, where, apart from a constant factor, the coefficients of the 


(312) 
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aj s will obviously approximate the corresponding serial coefficients 
in the system (312). 

The variance of the process {ij(t)} defined by our hypothetical set 
(a) will be given by formula (220). Of course, this relation holds 
irrespective of the method used in determining the set (a). On the 
other hand, choosing the system (312) for determining the coeffi- 
cients 0 a\ the resulting primary series fjt will evidently satisfy the 
parallel relation 

(314) I) 2 (Jjt) ^ (1 + a x r ± + r 2 •+■ - ■ + an Vh) * J D 2 (£*), 

where the sign co covers the approximation made in disregarding 
the first h terms in the series £< for which we cannot calculate 
corresponding elements fjt . In other words, the variance of the 
primary series fjt will approximate the hypothetical value D 2 (rj). It 
must be kept in mind that this will not always be the case if the 
trial set (a) is determined otherwise than by (312) (cf. p. 145). 

Summing up, the system (312) will give us a set a x , . . an which 
will minimize the variance of the corresponding residuals fjt. Fur- 
ther, the first h autocorrelation coefficients will coincide with the 
corresponding serial coefficients. However, the hypothetical correlo- 
gram will not always in its whole range yield a good fit to the 
empirical correlogram. In practice, we can compromise between 
the two desiderata of obtaining small residuals fjt and small devia- 
tions between the correlograms, and besides try to satisfy the rela- 
tion D 2 (rj)~I) 2 (fit). Before discussing this matter, let us see in de- 
tail how the method outlined will function when applied to the 
G. Myrdal cost of living index. 

Forming the system (312) for /t=2, inserting the values ^=’5216, 
F a — — ’2240 given in col. (1) of table 8, and solving for the coef- 
ficients a x and a 2 , we obtain a x = — *8771, a 2 = *6815. The roots 
of the characteristic equation z 2 + a x z -f a t =• 0 being *4385 ± ’6994 
and thus less than unity in modulus, we conclude that the relation 

(315) £(© - *8771 £(« - 1) + ’6815 £(« - 2) = ( t ) 

will define a process of linear autoregression. By construction, the 
first two autocorrelation coefficients of this process {£(£)} read i\ = 
^ = '5216, r 2 = f 2 = — *2240, while the following coefficients will 
be obtained recurrently from the difference relation n- -f *8871 n—i — 
— ’6815 — 2 = 0. The resulting correlogram, which is evidently of 

the form (243), is shown in fig. 14 (thin line). 
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Table 9 . Serial coefficients F* of the G. Myrdal cost of living index 
(col. (1))*, and autocorrelation coefficients r* belonging to the schemes 
(318) [col (2)), and (319) (col (3)). 


k 

(1) 

(2) 

(?) 

k 

(1) 

(2) 

(3) 

1 

*6216 

*5216 

*5386 

11 

-1533 

-1722 

-2170 

2 

-2240 

-*2240 

-1460 

12 

-*2530 

-0218 

-1586 

3 

-5811 

-5811 

-*5024 

13 

-*2254 

*1065 

-0119 

4 

-4826 

-4626 

-5105 

14 

-*0042 

• -1311 

*1042 

5 

-0903 

-*0734 

-2320 

15 

*1883 

*0609 

.1318 

6 

*2085 

*2749 

■1417 

16 

•1001 

-*0833 

*0747 

7 

•3138 

*3538 

'3458 

17 

•0723 

-0818 

-0140 

8 

*2613 

*1717 

•2902 

18 

-0130 

-0680 

-0743 

9 

*1434 

-•0820 

*0772 

19 

-0067 

-0062 

-0768 

10 

-0034 

-2172 

-1321 

20 

•0842 

*0408 

-0327 


Comparing with the empirical correlogram, it is seen that the 
period in the hypothetical correlogram is too short, and that the 
damping is a little too heavy. According to section 6, the damping 
factor equals V~a< ], while the period is given by p = 2 7 zlX, where 
cos X = — af^V~a v Thus, an increase in a 2 will bring on a slighter 
damping. Further, reducing X we obtain a longer period. However, 
as pointed out in the previous discussion, we cannot conclude with- 
out further evidence that it will be possible to improve the fit — 
the coefficients a x and a 2 determine also the constant factor and 
the phase of the damped harmonic, and it might happen that an 
adjustment in a t and a a would cause such a change, e. g. in the phase, 
that the total result of the adjustment would be a poorer fit. Pro- 
ceeding with the illustration, we shall next examine the total effect 
of an adjustment. 

In the correlogram of the approach (315), the period is found to 
be 6*22 years, while a good fit would require a period not below 
7 years. Reducing the damping by increasing a 2 from *6815 to 
*77, a short calculation will verify that a x = 1*10 will give a period 
p = 7*03 years. Now, let us examine the approach 

(316) £(f)-l*10£(*- 1)4- *77 £(*-2) = 17 ( 0 . 

* In reading the proofs, a slight error was discovered in the serial coefficients : 
they should all have been multiplied by a factor amounting to about 1*005. 
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The correlogram of the process {£(©} thus defined has been cal- 
culated from (221) and (222), and is shown in fig. 14 (broken line). 
Up to r 8 and r 9 , the hypothetical correlogram seems to fit rather 
well. Beyond this point, the fit is less satisfactory, partly because 
the graph of serial coefficients presents a slow descent to the min- 
imum in h ~ 12*5, and a rapid rise to the next maximum. This 
skewness will be recurred to later. 

A clear view of the adjustment will be obtained by calculating 
the roots of the character- 
istic equations of (315) and 

(316) . In fig. 13, these roots 
are indicated by small rings. 

In drawing the conjugate 
roots nearer to the peri- 
phery of the unit circle, the 
damping has been reduced, 
while the reduction in the 
angle l has elongated the 
period. 

Each of the approaches 
(315) and (316) gives rise 
to a primary series fjt, and 
we know from a previous 

remark of general scope Fig. IS. Adjustment paths in the approaches 
that the improvement in (315) (rings), and (318) (crosses). 

the fit of the hypothetical 

correlogram is obtained at the expense of an increase in the variance 
of the series fjt. In the approach (315), which is based on a system 
of type (312), the relation (314) will hold. 

Thus, paying regard to (220), and inserting a 1 = — ’8771, a 2 = 
= ‘6815, r 1 = ’5216, r 2 = — *2240, we obtain in this case l) 2 (f]t)~ 
~D 2 (r]) = ’390I) 2 (£). As to the residuals fjt derived from the ap- 
proach (316), the development 

(317) I) 2 (fjt) = l) 2 (& + a x gw + a 2 &- 2 ) ~ 

~ (1 -f a\ + u\ + 2 a x f t 4- 2 ci i r 2 + 2 a ± a 2 f\) • l) 2 (fct) 

will not reduce to (314). Inserting a x = —1*10, a s = *77, r x = ‘5216, 
f 2 = —*2240, we find in this case l) 2 {fjt) £2 ‘427 I) 2 (£*). In full agree- 
ment with the general theory, the adjustment in the coefficients (a) 
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has reduced the efficiency of the approach. As mentioned in 
connexion with (314), the hypothetical variance D 2 (rj) will also be 
affected by the adjustment. Generally speaking, nothing compels 
D 2 (rj) to follow the variation in I) 2 (fji). Actually, in the present 
case D 2 (rj) varies contrarily to fflirjt) — the first two autocorrela- 
tion coefficients in the scheme (316) being r 1 ==*6215, r 2 = — '0864 
(cf. fig. 14), it is readily verified by inserting these values together 
with — 1*10, a % = '77 in formula (220) that the variance in 
question will be given by D 2 {rj) = '250 D 2 (£). 



Fig. 14. Correlogram of the G. Myrdal cost of living index (thick line), and 
hypothetical correlogram corresponding to formula (315) (thin line), and formula 

(316) (broken line). 

Summing up the comparison between the approaches (315) and 
(316), the hypothetical correlogram fits better in the latter case, 
but the variance JD 2 (jj) is smaller in the former case and coincides 
with the hypothetical variance D 2 {rj) — in the approach (316) the 
difference between these variances is rather large. All in all, 
neither of the schemes seems adequate. In view of other experi- 
ments with approaches of the simple type (311), it seems as if we 
cannot find a satisfactory approach without taking into account 
more distant elements 3 , 4 , etc. 

Of course, having different desiderata to comply with in applying 
a scheme of linear autoregression, we could agree on which weights 
to attach to them, and then take them into consideration simul- 
taneously. In point of principle, it would then be possible to find 
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out a set (a) forming the best compromise in the sense agreed. 
However, judging from certain experiments of this kind, what might 
be gained in this way seems not worth the extensive computations 
involved. One way or another, the results arrived at in these 
experiments merit no recital. Accordingly, proceeding to an account 
of certain experiments with a scheme (219) involving four parame- 
ters (a), we shall follow the same line as before. 

Taking h = 4, and inserting in the system (3 12) the serial coef- 
ficients n given in table 9, we arrive at the following approach 



Fig. 15. Correlogram of the G. Myrdal cost of living index ( thick line), and 
hypothetical correlogram corresponding to formula (318) (thin line), and formula 

(319) (broken line). 


(318) £ (t) - *8100 £ (« - 1) + '7452 £ (f - 2) - ’0987 £ tf - 3) + 

+ -2101 £(* -4) = i?«). 

The autocorrelation coefficients of the process {£(£)} thus defined 
have been calculated from (221) and (222). The values found are 
given in col. (2) of table 9, and plotted in fig. 15 (thin line). 

It is seen that the approaches (316) and (318) give rise to almost 
coincident correlograms, only that here we have by construction 
r k = fh for Jc<> 4. As shown in fig. 13, this conformity is reflected 
in the characteristic equations — in the present case two of the 
four roots are lying in the close neighbourhood of the roots be- 
longing to the approach (316). The numerical values found for 
the roots in the approach (318) read "5385 i '6814 i , — '1335 i *5106 i. 
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In adjusting the approach (318), we are at liberty to move the 
roots of the characteristic equation in any directions, keeping 
in mind that complex roots must be conjugate. The dominant 
component in the correlogram evidently corresponds to the roots 
'5385 ± ‘6814 i , say q • e~ il . The period of this component is nearly 
7 years, while the empirical correlogram suggests a somewhat longer 
period. Now, reducing the angle k so as to obtain a period equal- 
ling 7‘5 years, the resulting set of coefficients (a) gave rise to a 
correlogram with reduced amplitudes. Neutralizing this effect by 
reducing the damping by means of a slight move towards the 
periphery of the unit circle, it was found adequate to perform a 
simultaneous move in the other two roots. As a matter of fact, 
the deviations between rk and ft for h = 1 — 4 caused by the ad- 
justments mentioned were found to be substantially reduced by 
moving the root — '1335 + ‘5106 i in a direction nearly opposite 
to the adjustment in the root ‘5385 + ‘6814 i. 

Having thus arrived at the roots ‘5888 + ‘6540 i, — ‘20 ± ‘58 i, 
the paths followed are indicated in fig. 13. As is readily verified, 
the adjusted roots belong to the approach 

(319) £ if) - ‘7776 £ (t - 1) + ‘6797 £ (t - 2) - ‘1342 £ (t - 3) + 

+ ‘2914 £(f- 4) = 77 ( 0 . 

The autocorrelation coefficients of this scheme as derived from the 
system (222) and the relations (221) are given in col. (3) of table 9, 
and plotted in fig. 15 (broken line). The fit to the empirical correlo- 
gram is not very close, but the general shape of the hypothetical 
correlogram is rather satisfactory. Comparing with the approach 
(316), which involves only two parameters, it is seen that the 
improvement bears chiefly upon the period and the coefficient r v 
Using formula (314), and paying regard to the identities r* = 

A < 4, the approach (318) gives D 2 (rj) — ‘371 D 2 (£), D 2 (fj)= ‘371 7) 2 (£). 
Comparing with the simple scheme (315), it is seen that the reduc- 
tion of the factor in D 2 (rj) and l) 2 (fj) amounts to only ‘019. In 
other words, the introduction of two more parameters has brought 
on but a slight increase in the efficiency of the approach. However, 
comparing the adjusted schemes, we find a definite improvement. 
Proceeding as in (317), we find in the first place that the residuals 
derived from (319) will satisfy the relation jD 2 (fj) — ‘381 l) 2 (£) 
(working directly on the series £t and fjt given in table 8, we find 
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jD 2 (fy) = ’385 JD 2 (£*)). The adjustment has thus reduced the effi- 
ciency of the approach but very slightly. On the other hand, 
applying formula (220), we find that the parallel hypothetical relation 
reads D 2 (rj) = *401 D 2 (£). Contrary to the situation in the approach 
(316), it is seen that in the present case approximates D 2 (rj) 

even after the adjustment. 

Perhaps it would be possible to find an adjustment improving 
the approach (319). However, in view of the above figures, not 
much can be gained by a continued adjusting. Nor does it seem 



Fig. 16. G. Myrdal cost of living^ index £t. The scatters fy— i) (left), and 

(&, &-s) (right). 


as if a real improvement could be secured by enlarging our set (a). 
Be that as it may, the above examples are sufficient for our purpose 
of illustrating a general method for deriving a trial set of coeffi- 
cients (a) in applying a process of linear autoregression, and for 
performing adjustments in the trial set. 

Proceeding to a discussion of the approach (319), the same general 
viewpoints present themselves as in the case of moving averages. 
Accordingly, referring to the remarks in the previous section (cf. 
p. 163), we need draw attention only to a few circumstances which 
are peculiar to the scheme of linear autoregression. 

As indicated by the term proposed, the hypothesis of linear auto- 
regression implies a linear regression upon £* of each of the preced- 
ing elements 2 , etc. This circumstance having already 

been employed by G. U. Yule (1927) for testing an approach of 
this kind (cf. section 29, p. 141), we show in fig. 16 the scatters 
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(&> 1 ) and (£*, £t— a) of the fluctuations in the Myrdal index. 

The deviations from linearity in the connexion between the vari- 
ables do not seem disturbing. 

As in the case of moving averages, different tests of a scheme 
of linear autoregression may be based on the hypothetical random- 
ness of the primary series fjt. Such tests may, for example, be 
focussed upon the scatters (fjt, fjt-k)- The scatter fjt- 1 ) formed 
by the residuals fjt given in table 8 is shown in fig 17. 

The above mentioned skewness in the correlogram of the G. 

Myrdal index (see fig. 14) gives an in- 
teresting illustration of the difficulties of 
testing time series schemes. Is it permit- 
ted to look upon the deviation from the - 
hypothetical correlogram as produced by 
pure chance? An examination of this 
question must pay due regard to the 
interdependence of the serial coefficients 
in a sample series. For instance, a chance 
deviation in Jc= 13 would perhaps be 
Fig. 17. Scatter °f a most often attended by such deviations 

hypoUtetical primary series of ^ the neighbouring coefficients that the 

of living total picture would present a skew oscilla- 

tion. The question of how much weight 
to attach to deviations of this and similar kinds seems extremely 
intricate. Perhaps nothing better can be done than to compute a 
large number of model series correlograms, and then to compare 
the deviations from the hypothetical curve. 

As pointed out in section 24, an approach of linear autoregression 
will give rise to a forecast curve which forms a damped oscillation 
of type (33), and satisfies the same linear difference equation as the 
autocorrelation coefficients. Letting as before Ft [£ (t + A)] represent 
a forecast over h time units, formula (214) shows that Ft [£ it + A)] 
may be conveniently computed recurrently. For instance, consider- 
ing the approach (319), we get Ft [f (t + 1)] = (1 + a x + a 2 + a 3 + a 4 ) • 
■in — a 4 It — a a £f-i — a a £*_ a — a 4 £t_ 3 , F t [£ (t + 2)] = (l+a 1 + a 8 -t- 
+ a a + a 4 ) • m — a x ■ Ft [£ (t 4- 1)] — a t £t — a a £«_ 1 — a 4 £t_ a , etc. In 
particular, inserting for £=1912 the values of 'C,t given in col (1) 
of table 8, we find Fim E (1913)] = 41*4, F lta [£(1914)] = 5'2, etc. 
The forecast curve .Fms [£(1912 + A)] is shown in fig. 12 (dashes 
and dots). A check of the first forecast is obtained by deducting 
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the hypothetical residual (^random shocks) fjmz = — 17*4; the result 
is 24, which equals £1913. Fig. 12 shows also the forecast curve 
J1913 [£0913 + i)], (dotted line). 

The two forecast curves in fig. 12 yield a good illustration of 
the prognosis situation in an approach of linear autoregression. 
Firstly, while a forecast F t [£ (t + Jc)] is often rather efficient for 
small i-values, the efficiency vanishes asymptotically as k increases 
(cf. p. 165). Further, as soon as we are in a position to take a 
new observation £<+1 into consideration when forming the prognosis, 
the forecast curve is often substantially modified; how much, will 
depend on the residual fjt+\ = £t+i — Ft [£(# + 1)]. — Summing up, 
it is the short forecasts that are efficient. In this respect, we meet the 
same situation as in the scheme of moving averages, and the same 
contrast to the scheme of hidden periodicities (cf. p. 168). On the 
other hand, under special circumstances the oscillations in a scheme 
of linear regression are nearly functional, viz. nearly strictly periodic 
— as remarked in discussing the sinusoidal limit theorem of E. 
Slutsky (cf. p. 120), processes of hidden periodicities can be obtained 
as limit cases of the schemes of linear autoregression. 

As pointed out in section 21, the scheme of linear autoregression 
constitutes the proper starting point when studying oscillatory 
mechanisms which are subjected to random impulses. A typical 
approach of this kind is formed by the complete systems as dealt 
with in several recent economic studies (see e. g. R. Frisch (1933), 
J. Tinbergen (1937)). A simple example of a complete system is 
given by 

(320) + 

U (0 = d 0 |(fl + d x §(t - 1) + rj" (t). 

For instance, as a first approximation we may take the production 
volume £ C t ) of a commodity to be linearly correlated with the price 
£(f — 1), and the price £(© to be linearly built up by the production 
volumes £ (t) and § (t — ■ 1). 

In this connexion, the pertinent thing is that a complete system 
may be reduced to a single relation involving but one of the 
fundamental variables £, £, etc. Considering e. g. the simple case 
(320), we get at once 


(321) 


C(0 + ajCtf-i) + «.£(*- = 
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where a ± and a 2 & re constants, and t] is linear in the variables rj 
and 7)”. 

Of course, in order to study a complete system in detail, we must 
consider stochastical processes in several dimensions, a generaliza- 
tion not in the program of the present study. However, it is 
evident that theorem 9 applies directly to the reduced relations 
exemplified by (321). Thus, under general conditions concerning 
the variables rj(f) and the constants (a), the variables £(0 form a 
stationary process. In point of principle, we are in a position to 
investigate the properties of this process {£00}. Considering e. g. 
the autocorrelation coefficients, it follows without difficulty that 
these will satisfy a linear difference equation with a right member. 

It is seen that the theory of linear autoregression as developed 
in sections 24 and 25 covers the case when the complete system 
reduces to a relation with a purely random right member. We 
have seen that this analysis has given certain results which cannot 
be reached by functional methods. For instance, if the left member 
of the relation (321) is characterized by an intrinsic damped 
oscillation, and if the damping is too heavy, the tendency to 
periodicity cannot be distinguished in the mechanism as subjected 
to random impulses. Another example of this is given by the 
approach (319), which presents two intrinsic periods. Having 
already mentioned that one of these equals 7*5 years, a short cal- 
culation will verify that the other period is 3*44 years. Now, as 
shown in fig. 13, the root corresponding to the latter period is 
l J in S rather near the centre of the unit circle, which implies that 
the damping is rather heavy. Thus, even if the period 3*44 is 
quantitatively reliable and significant — which seems to me rather 
doubtful we cannot conclude without further analysis that this 
period can be found by a periodogram construction or similar 
methods. Be that as it may, as pointed out before I attach no 
importance to the quantitative significance of the above analysis of 
the Myrdal index. 

Starting from explicit assumptions about the relations between 
the different variables in an economic system, E. Lundberg (1937) 
has examined how the system will develop from hypothetical initial 
conditions. Since several of the relations assumed are non-linear, 
his approach may by looked upon as a generalization of the linear 
systems as exemplified by (320). Now, the analysis of E. Lundberg 
shows that the variables will often present tendencies to diverge 
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as an economic expansion goes on, tendencies causing tensions 
which make the economic system instable. Of course, in point of 
principle it would be possible to apply stochastical methods even 
in this approach. However, it seems extremely difficult to obtain 
in this way a non-evolutive scheme for the system considered. 
Anyhow, in view of the investigations of E. Lundberg, the linear 
approaches earlier discussed seem far from sufficient for giving an 
adequate hypothetical model for studying the economic cycles in 
detail. Looking upon the approach (319) from this viewpoint, it 
might be said that even if this scheme does correctly sum up the 
main features of the index examined, the interpretation in terms 
of oscillatory mechanisms which is suggested by such a simple 
approach cannot possibly be completely realistic. 17 

Having in the previous section mentioned certain geophysical 
phenomena which invite to studies on the basis of the scheme of 
moving averages, this section will be terminated by a few suggestions 
about the wide applicability of the scheme of linear autoregression. 

As surveyed in section 29, G. U. Yule (1927) introduces the 
concept of autoregression in studying the 11 year wave in the 
sunspot numbers. Further, since the criterion of J. Bartels (1935) 
(cf. p. 26) suggests that certain waves in terrestrial magnetism are 
»quasi-persistent», the construction of this criterion makes us expect 
that in these cases an autoregression approach will be fruitful. Of 
course, here the scheme of linear autoregression suggests itself also 
a priori. For instance, let us consider the 27 day wave, which 
is due to the sunspot intensity and the rotation of sun. The 
duration of a sunspot often being rather long, the sunspot intensity 
as observed in a time point t , say £?, must be positively correlated 
with the intensity 27 days earlier. Having stated this, it seems 
plausible that a correlogram of terrestrial magnetism will present 
an oscillation with a period of about 27 days. We must also 
expect that the oscillation in the empirical correlogram will be 
damped, and that the degree of damping will depend on the 
average duration of a sunspot. 

It is not difficult to point out other instances where the scheme 
of linear autoregression seems plausible on theoretical grounds. 
Even the simple scheme (237) involving but one parameter might 
often prove useful, at least as a first approximation. For instance, 
we have already found that the scheme (284) presents a correlogram 
which fits rather well to the correlogram of air pressure examined 
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by. Sir G-. Walker (1931). In this case, the autoregression, 
obviously may be interpreted as an. effect of inertia — generally 
speaking, we may always expect an autoregression of type (237), 
and with a positive constant jp, when dealing with phenomena 
characterized by irreversibility. According to formula (239), the 
expectation corresponding to a long period is in such cases larger 
than in a purely random series. In other words, there will be a 
tendency to spurious periodicity, a tendency of the observational 
time series to present long waves the lengths of which are varying 
and without physical significance. 

Finally, it is evident that the theory of autocorrelation may be 
applied to the functional transform (21) used by N. Wiener (1930) 
in the theory of light. Following up this idea, we are led to 
interpreting the transmission of light as a stationary process. With 
suitable arrangements about the dispersion of the process, the expres- 
sion (21) would then correspond to the autocorrelation coefficients, 
while the function SU) as given by (23) would reduce to the generating 
function of the autocorrelation coefficients (cf. section 17). In this 
approach, the continuous parts of the spectrum of light would 
correspond to the continuity intervals of the generating function. 
Perhaps the scheme of linear autoregression might serve as a 
starting point for investigations on these lines, for according to 
theorem 11 this scheme presents a continuous generating function, 
and, as pointed out in section 25 (see p. 120), we may obtain any 
scheme of hidden periodicities of type (39) as a limit case. 


Appendix 1 

Notes to the second edition 

1 The names of the various schemes and processes were in the 1st edition 

introduced by way of tentative suggestions. To some extent they have been 
accepted in later works [see, for instance, refs. 2, 28, 29, 40], but the ter- 
minology is not too well established, and the author thinks it would be to 
advantage to change some of the terms. For one thing, now that the pioneer 
works of Yule and Slutsky can be viewed at a longer distance of time, 
it would be appropriate to speak of the Yule 'process instead of the scheme 
of autoregression, and the Slutsky process instead of the scheme of moving 
averages. Further the author should like to rename the process of linear 
regression, calling it instead the process of moving summation , a term proposed 
by A. Kolmogoroff [ref. 33]. [p. 3] 

2 Smirnoff’s test, as well as a related test by Kolmogoroff [ref. 30], has 

the additional advantage that its distribution is independent of the func- 
tion F(u). [p. 21] 

8 This theorem is known as the statistical ergodic theorem of Birkhoff- 
Khintchine. Otherwise expressed, the theorem states that if i { (t) is a fixed 
realization of the process considered, its average over n time points, that is 

l*- 1 

"2 ~ v )> will with probability 1 tend to a limit as n^oo. Or in yet 

0 

another reformulation, the statement “with probability 1” means that the 
limiting average 

M[f] = lim n ff < (^v) 

n-*o© v-0 

will exist for “almost all” realizations of the process. 

The importance of Birkhoff-Khintchine’s theorem lies in the fact that, 
whereas the definition of the process { f (t) } refers to a universe of realizations, 
the theorem makes a statement about an individual realization. We note 
the dualism between the “phase average” Af[f], which is an average over 
t for a fixed realization, and the “space average” E[ £], which is an average 
over all realizations for a fixed L Since we are dealing with stationary 
processes, E[£] is the same for all t , whereas M [f] will in general vary from 
one realization to another. For a large class of stationary processes, and 
in particular for the ergodic processes, we have M [f] = 2?[f] for almost all 
realizations, so that the dualism disappears. [p. 35] 

4 Quite generally, let 


/.(f) «« + !)....) 
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be a function of random variables |(f) that constitute a stationary process 
{!(*)}. Then 

/ 4+fc (|) = /( • • •. f(f + i-l), f(i + k), g(t + k + l), . . .) 


is obtained from f t {i) by moving all variables | simultaneously k steps 
to the left. We see that f t+k = /,+*.(£) is a well-defined random variable 
for all k, and that 

{ft} = (• • •> ft- 1, f t , ft+u • • •) 
constitutes a stationary process. 

We are now in a position to define the notion of ergodicity: Given a 
stationary process {£ (i)}, we consider an associated process {/ t }. By the 
Birkhoff-Khintchine theorem, the phase average 

M[f] = lim^ n f/ t+v (f) 

71— >00 Tv )| aQ 

will be a welbdefined random variable. Process {f (t)} is called ergodic if, 
for all functions / sucb that the expectation of f 2 is finite, the phase average 
M[f] equals the space average E[f] for almost all realizations. 

Reference is made to E. Hope, ref. 25, for the general ergodic theory. 
As has been noted in section 12, the statistical implications of ergodic 
theory are of fundamental importance for the applications of stationary 
processes (see also Appendix notes 14 and 16). [p. 38] 

5 The statements or theorems in question have direct parallels in the 
theory of statistical scatters or distributions. Reference should here be made 
to the pioneer work by R. Frisch, ref. 19, where matrix calculus was for 
the first time systematically employed in statistics. Among the first fruits 
Frisch reaped thereby were the notion of rank of a statistical scatter, and 
the theorem that this equals the rank of the corresponding covariance 
matrix and quadratic form. A parallel theorem on probability distributions 
has been given by H. Cramer (1937) [Theorem 32 (B)]; for a more detailed 
treatment, see J. Lukomski, ref. 37, and H. Cramer, ref. 10 (Ch. 22*5]. 

[p. 44] 

6 Relations equivalent or closely related to (127)— (131) belong to the 

groundwork in the “generalized harmonic analysis” of N. Wiener (1930). 
Wiener is primarily concerned with the analysis of an individual time- 
series, but he extends the analysis to the case of random processes, showing 
that his methods can be used for a joint simultaneous analysis of the 
universe of realizations of the process. As applied to stationary processes, 
Wiener’s methods have in the hands of H. Cramer [ref. 9] given further 
important results. (Cf. also Appendix note 16.) [p. 73] 

7 More precisely, it will be resultless from the viewpoint of forecasting, 
a main theme of the present work. Over a time interval of any fixed length, 
the time-series considered can be graduated to any prescribed accuracy by 
the use of periodogram analysis, but if 2 1 ffc | is convergent the graduation, 
will provide no valid forecast outside the interval under analysis. Similarly, 
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the generalized harmonic analysis of Wiener (see the previous note) is 
of restricted scope from the viewpoint of forecasting. [p. 74] 

8 In ref. 64 [Ch. 12] the reader will find a detailed exposition of regression 
analysis from the viewpoint of least-squares approximation in linear spaces. 

[P. 76] 

9 For a detailed treatment of this and related theorems, see ref. 64 

[Theorems 12-3*2-3]. [p. 78] 

10 We have seen that the singular process as defined in section 14 is a 

straightforward extension of Frisch’s notion of singular (or collinear, in 
Frisch’s terminilogy, ref. 19) distributions in a finite number of dimensions. 
The extension to processes, however, brings in essentially new features, as 
seen from our Theorem 2 [p. 45] and from our remarks in section 16 [p. 64]. 
A further extension is involved in the present definition of singularities 
of infinite rank. It will be noted that our definition of the singular process 
is framed with a view to the possibilities of forecasting. In fact, whether the 
s ing ularity is of finite or infinite rank we see that such a process can be 
forecasted with any prescribed accuracy on the basis of its past develop- 
ment. More precisely, let (t), £< (* - 1), $ { ( t - 2), . . .] denote an arbitrary 

realization of a singular process {£(£)}; then “almost all” realizations are 
such that if £*(£ — 1), f f -( t — 2), ... are known we can calculate a forecast of 

with a forecast error which has zero expectation and a variance which 
can be made arbitrarily small. 

A. Kolmogoroff, refs. 31, 32, has linked up the singularity of a stationary 
process {£(£)} with the properties of its spectral function W(X) as defined 
by formula (116). Specifically, he has shown that a necessary and sufficient 
condition for {f (t) } to be singular of finite or infinite rank is that the integral 

f | log W' (X)\AX 

o 

should be infinite. 

Regarding the terminology, the author suggested the term singular 'process 
in view of the relationship with singular matrices and quadratic forms. 
Better and by now universally accepted is the term deterministic process 
introduced by J. Doob, ref. 15. Finally we note the alternative term harmonic 
process [see ref. 64] for the process of superposed harmonics dealt with in 
Theorem 2. Thus a deterministic process is either a harmonic process or 
a singular process of infinite rank. [p. 81] 

11 This theorem has been deepened and generalized in a brilliant work by 

A. Kolmogoroff, ref. 33. For comment, see Appendix 2 [cf. also ref. 64, 

Ch. 12-5-7]. [p. 89] 

12 It will be noted that Theorem 8 is not restricted to the case when all 

roots of the characteristic equation of (194) lie within the periphery of the 

unit circle. [P* 97] 

13 The prediction theory of stationary processes is the subject of important 

works by Kolmogoroff, refs. 31, 32, and Wiener, ref. 60. See also Appen- 
dix 2. [p* 103] 


13 - 535697 . H. Wold. 
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14 In the applications of stationary processes, generally speaking, we are 
given a time series from which we wish to extract information about the 
process of which the given series is regarded as a realization. Our device 
for dealing with this inference problem is to regard the observed serial 
coefficients f k as large-sample estimates of the theoretical autocorrelation 
coefficients r k . The rationale of this device is embodied in the ergodic 
theorem of Birkhoef-Khintchine [Section 12]; in fact, r k and r k are built 
up by first and second order moments defined as phase averages and space 
averages, respectively, and for an ergodic process the two types of average 
are asymptotically equal. The Birkhoff-Khintchine theorem thus is 
seen to constitute an essential generalization of the large-sample theorems 
of classical statistics. For further comments on the statistical implications 
of ergodic theory, see ref. 64 [Chs. 9*4, 10*4, 11*1]. 

In recent years the theory of time-series inference has made rapid progress. 
Briefly stated, we may distinguish two stages in the development. In the 
first stage the typical problem is to establish the distribution of f k , or 
of any other parameter estimate, on the basis of specific assumptions about 
the generating process. In the second stage the inference is not restricted 
to a single parameter, but refers to the entire structure of the generating 
process. For the treatment of the deep-lying problems of the second type, 
a new line of approach has recently been opened up by P. Whittle, refs. 
54-59. Constructed on a least-squares basis, Whittle’s methods have certain 
optimal properties as large-sample procedures, and they lead to a uniform 
treatment of large classes of stationary processes, including the processes 
of moving summation. Whittle gives an expository survey of his methods 
in Appendix 2, in which also some fresh results are incorporated. [p. 109] 

15 Another case of interest is p =q, giving b k = (k + l)p k . (Cf. also 

p. 145.) [p. 113] 

16 In Appendix notes 3, 4 and 14 we have touched upon the fact that our 
theoretical analysis in Chapters II— III and in particular the notion of 
theoretical correlogram refers to a stochastic process, i.e. to a universe of 
hypothetical time-series (realizations), whereas in the applications we are 
usually concerned with a single time-series, and in particular the empirical 
correlogram is usually based on a single series. For ergodic processes the 
dualism between universe and single realization is not essential, for we 
know that an individual realization will then suffice to give full information 
about the whole universe. 

The dualism between universe and realization has been discussed from 
statistical viewpoints in a later paper, ref. 62 (see also ref. 64, Ch. 12*5-7). 
On the assumption that the time-series under analysis is given from the 
infinite past up to a fixed time-point t 0i it is shown that the series can be 
subjected to a decomposition analogous to that in Theorem 7. The decomposi- 
tion applies, in particular, to an individual realization of a stationary process. 
For ergodic processes, this decomposition will with probability 1 coincide 
with that given by Theorem 7. For non-ergodic processes, on the other hand, 
the two decompositions will in general differ, the one based on an individual 
realization giving components the structure of which may vary from one 
realization to another. From the viewpoint of the applications it is important 
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to note, however, that the information extracted from the past of an individ- 
ual realization can always be employed for forecasting the future develop- 
ment of the same realization. [p. 147] 

17 Stochastic processes in several dimensions or variables is a wide field 
of research, which in recent years has been explored in several directions; 
reference is made to the fundamental works of H. Cramer, ref. 8, V. 
Zasuhin, ref. 65, H. B. Mann and A. Wald, ref. 39, and P. Whittle, ref. 59. 
The scheme (320) falls under the heading of the recursive process, a type of 
multidimensional process which merits particular attention because of its 
general scope in the applications. The pioneer on this line is Tinbergen 
(1937), who has followed up the approach in further important investiga- 
tions [refs. 47, 48]. Later, the theory and the application problems of the 
recursive process have been studied in some detail by the author [see R. 
Bentzel and H. Wold, ref. 6, and refs. 62, 63; the results have been summed 
up in ref. 64 (Chs. 3*2 and 12*7). See also refs. 5, 49, 50]. [p. 189] 



Appendix 2 
by P. Whittle 

Some recent contributions to the theory of stationary 

processes 

In view of the necessarily short compass of this article it has been deemed 
better to include as a rule only such work as fits into a uniform treatment, 
rather than to attempt a faithful review of later years’ literature. Further, 
while there are aspects of neighbouring fields, such as the more general 
theory of stochastic processes, which are far from irrelevant to the present 
discussion, they must regretfully be left to such treatments as can do them 
justice. 


The two theories whose application has most helped time series analysis 
during the last fifteen years are those of spectral analysis and statistical 
inference . 

Referring to equation (115) of Professor Wold’s work, spectral theory 
may be described as the study of the function W(x), together with its 
exploitation to classify the different types of process, and to deduce the 
existence of stochastic relations among the process variates. This last 
aspect, the deduction of stochastic relations, includes what is generally 
known as prediction theory . It is common enough that we assume some rela- 
tion to hold between the variates (e.g., an autoregression) and then calculate 
the spectrum, etc. Prediction theory goes rather in the opposite direction, 
and seeks to establish such relations, given the spectrum. Wold’s decompo- 
sition (Theorem 7) was the first proof that such a relation exists for a general 
class of cases. Of course, the ultimate aim of most time series analyses is 
to obtain a forecast, so that the idea of prediction is latent in the whole 
subject. 

It is useful to keep in mind the intuitive meaning of W(x). If the series 
is regarded as describing some electrical or mechanical oscillation, then W ( x ) 
is the theoretical distribution function of the oscillation power among the 
different frequency components. This distribution gives surprisingly de- 
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tailed information on the mechanism of generation of the series — hence 
its importance. 

We use the word “inference” in its usual sense: the endeavour to form 
a general hypothesis from observed facts; in this case, the setting up of a 
hypothetical model to explain as nearly as possible the generation of a 
given series. More specifically, in statistical inference one sets up criteria 
measuring the degree of agreement of hypothesis and observation (tests of 
fit), or discriminating between different hypotheses (discriminatory tests), 
and one endeavours to estimate numerical quantities involved in the hypo- 
theses. 

The theory of inference is a strange mixture of mathematical and non- 
mathematical considerations. It is mathematical insofar as that criteria 
such as those mentioned above may be constructed, even although they 
may often have a flavour of arbitrariness. But when we ask, which hypothe- 
ses shall be tested? what a priori weightings shall be given them? then we 
enter a realm which is perhaps inaccessible to exact method, and where 
full certainty can never be reached. The pursual of these questions would 
be profitless here, however, and we shall not continue it until we are in a 
position to do so more exactly. 

Chapter 1. Spectral theory 
1*1. Spectral representation 

It is convenient to modify slightly the notation of Wold’s § 17, so that 
equations (114), (115) become 

+ OO 

e{u)= h S a) 

— 00 

n 

ek =h \ elkadW{(0) (2) 

— 71 

the spectral representations of the autocovariance function of a continuous 
and of a discrete process, respectively. As in that section, the inverse 
relations hold: 

+ 00 

r 

V M= J (3) 

— 00 

+ oo Jka) 


(4) 
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It is already evident that the theories of a continuous and a discrete 
process are very similar. The former is actually the more general, but if 
we nevertheless restrict ourselves to the discrete case, we shall avoid 
mathematical difficulties which are most often irrelevant to the physical 
problem, and still have a treatment general enough for the overwhelming 
majority of practical cases. 

Now, from the Fourier representation of the autocovariance (2) Cramer 
has deduced a similar representation of the process variate 

n 

x t = je ita> dy{co) (5) 

— 7t 

[see ref. 9]. The equality (5) must be understood to hold as a limit in mean 
square. {y(co)} is also a stochastic process, a so-called 'process of noncorrelated 
increments . That is 

e [{y ( M i) ~ y (" 2 )} {y M~y (<*>4)}] =0 if <w 1 ><u 2 >cy 3 ^w 4 . (6) 

This has as consequence that, at least when co x and co 2 are continuity 
points of W (to), 

W(co 1 )-W(a> i )=2nE[\y(a> 1 )-y(co i )\‘ i ] (co^co,). (7) 

That is, the increment of the spectrum in a certain interval is proportional 
to the mean square of the corresponding y increment in that interval. 
Should the spectrum there not increase at all, then the y increment in any 
part of that interval is equal to zero, in mean square. 

1 * 2 . Decomposition of the variate 

Cramer’s representation (1*1*5) may be regarded as the limiting case 
of a graduation of the process in terms of harmonic functions of time; more 
precisely, a graduation over a finite interval ( - T, + T) is obtained by 
Fourier analysis, and then the interval is extended by letting T-> oo. 
Wold’s representation (Theorem 7) is on the other hand derived from a 
representation of the process in terms of its own past. Thus, briefly stated, 
Cramer proceeds in terms of graduation, Wold in terms of forecasting. 
In view of the apparent divergence of these two approaches, it is of interest 
to note that they lead to results which are very nearly related, although 
it is no trivial matter to establish precisely what is equivalent and what 
is different. We shall briefly review some work of Cramer, Kolmogoroff 
and Wiener [refs. 9, 31-33, 60; see also Wiener (1930)] in the field. 

The values of the derivative of the spectrum between 0 and 2 tz constitute 
in effect the assembly of eigenvalues of the infinite covariance matrix of 
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the process (. . . x t _ x , x t , x t+1 . . .), [see ref. 45 p. 201, ref. 54 p. 36]. Since 
a finite set of statistical variates identically obeys some linear relation if 
its covariance matrix possesses one or more zero eigenvalues, we may 
expect the general result that the {x t } process will be singular (deterministic) 
in the sense of section 14 if this derivative be anywhere zero, i.e. if the 
spectrum be anywhere nondecreasing. We are dealing with an infinite num- 
ber of variates, however, and a continuous assembly of eigenvalues, so that 
the actual results are in general of a more sophisticated nature. 

Cramer [ref. 9] proceeded from the classical decomposition of a monotone 
increasing function 

W(co) = Tf^co) + W 2 (co) + Tfa(ft)). (1) 

Here W x ((x>), W 2 (co) and Tf 8 (a>) are nondecreasing, and are, respectively, 
an absolutely continuous function, a step function, and a so-called singular 
function, continuous, but constant almost everywhere. Define now the 
three processes y^co), y 2 (w), y 3 {co) such that y 5 (oj) is constant except at the 
points of increase of TF,(co), ( j = 2, 3), where it has an increment equal to 
that of y(pji)\ y x (co) =y{co) -y 2 (oo) ~ 2 / 3 (co). Then the process variate may 
be decomposed into 

3 * 

x t = x x ( t ) + x 2 ( t ) + x z (t) = 2 f e ita> dtjj(cQ). (2) 

1 - n 

It follows from (1T6) that the three components are uncorrelated, and 
that their spectra are W x (cS), W 2 {(o) and W 2 (co) respectively. 

If y 2 (co) has a finite number of points of increase, x 2 (t) is the sum of a 
finite number of harmonic terms, and so, as Wold reasons, constitutes a 
so-called singular or nondeterministic component. In general, however, the 
series of discontinuities of y 2 {(x>) will be denumerable, but it may be 
shown [ref. 31; cf. also ref. 44] that x 2 (t) is still deterministic. The com- 
ponent x z (t) is not so easily characterised, but may also be shown to be 
deterministic, as the largely constant nature of its spectrum would lead us 
to expect. 

We see thus that both x 2 (t) and x z (t) must be relegated to ip(t), the 
deterministic component of Wold’s representation. Of the remaining com- 
ponent, x x (t), it may be said that it is either purely nondeterministic or 
deterministic (nonsingular or singular, in Wold’s terminology) according 
as the condition 

J | log TTi (co) | eZ a> < oo (3) 

— 71 

is fulfilled or not (this is Kolmogoroff’s result, refs. 31, 32). To see this, 
we define 
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0) 

r» = 

and 

(5) 

—n 

It is readily verified tliat the rf fi t variates are mutually uncorrelated and 
have unit variance, and that 

X! f - / VWUco) d Y* (co) = + f d s rjts (6) 

Sm,-00 

if VWUfl}) is expanded in a formal Fourier series That is, a 

- oo 

process with absolutely continuous spectrum may be represented as a 
moving average of a sequence of uncorrelated variates. Conversely, a process 
thus representable must have absolutely continuous spectrum [ref. 33; see 
also ref. 16]. 

If condition (3) is fulfilled, then the representation (6) may be specialised 
to that of a one-sided moving average, where $ assumes values from 0 to 4 - oo 
so that x x (t) is purely nondeterministic (it is in this case the component £(£) 
of Wolds decomposition). For then log TFi(a)) may be expanded in a 
symmetric Fourier series 

log Wi(co)= ^Zc s e~ ia)S (7) 

— 00 

and, defining 

L c s e ~ icoS 

0 (cu) = e 1 =1 + b 1 e- i< ° + b i e- Zia, + ■■■ (8) 

we have 

W[ (co) = e c °d(co)6(- co ). (9) 

That is, we have succeeded in breaking Wi (co) up into two conjugate factors, 
one of which may be expanded in nonpositive powers of e 1 w . Defining now 

—71 

71 

rjt= je tot dY(o>) 

-71 

we find as before that the &re uncorrelated (although they have now 
variance e °, a consequence of the fact that the leading coefficient in (8) 
was chosen as unity) and that 


( 10 ) 

( 11 ) 


J 


dg i (co) 
VW[ (co) 


( 4 ) 
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x 1 (t) = je ,ta 6 (to) dY{co)=J,b s rj t . s . (12) 

-n 0 

This is the required representation. 

If, on the other hand, condition (3) should be violated, then x x (t) is 
actually deterministic, despite the apparent nondeterminacy of representa- 
tion (6). This may be roughly seen by representing the process with spectrum 
c(o + Wi((o) (c a constant) as an infinite autoregression. Upon letting c 
tend to zero, it is found that the variance of the error term also tends to 
zero, so that cc 1 (^) is determined by past values, although the autoregression 
may not exist as such. 

1 * 3 . Linear operations 

We saw in the previous section that the general purely non-deterministic 
process could be represented as a one-sided moving average 

OQ 

x t = 2b s r)ts. ( 1 ) 

Now, the forming of this moving average may be regarded as an operation 
upon the rj process, an operation which is summed up in the function 
0(co). For instance, the operation of a finite moving average 

p 

xt-2b 8 rit-s ( 2 ) 

corresponds to 0(a)) =6 0 J tb 1 e" la> H h6 p e _tpa> , while the autoregression 

or (3) 

= +QVt-i + Q 2 Vt- 2 + 

corresponds to 6(o>) =1 + Qe~ i<0 + Q 2 e~ 2i< ° + ••• = (1 -ge“ fct> ) -1 . 

This is one of the great virtues of spectral analysis, that a linear operation 
upon a process is equivalent to a certain function. Thus, the calculus of such 
operations may be interpreted in terms of a much more familiar calculus 
of functions. 

Suppose that we represent the operation by L , i.e. 

L(rit) == b 0 Vt + b iVt-i+ b 2 r lt-2 + - W 

and write the equivalence as L~r 6(oo). Then the following rules are fairly 
simply deduced from equation (11) of the preceding section: 
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L x 4- L 2 = L 2 + L x *=f 6 t (co) + 6 2 (co) (5) 

L x L 2 ^L 2 L 1 ^6 1 (co) 6 2 (co) (6) 

L-'=[6(to)]-K (7) 


Equation (7) holds as it stands only if Z” 1 is also of type (4), i.e. if [0(a))]“ 1 
has also a Taylor expansion in e~ t<0 . 

The following may serve as an example of the application of these rules. 
Consider the autoregressive scheme 


x t + a 1 %t~i d" * * * 4" —vjf 

(8) 

This equation may be written 


L-'x t = r) t . 

(9) 

Comparing (8), (9) we see that 


L-i = 1 + a r e~ ia + ••• +a p e~ iv< °. 

(10) 

Thus, by (7) 


i=(l + a 1 e- i " + -- +a J) e ~ iJ, “)~ 1 

(ID 


which gives 6(co) for the scheme (8). Since all singularities of 6(co) are in the 
upper half-plane, the application of (7) is valid. 

We see from (6) that the spectral function [i.e., the differentiated spec- 
trum] of the process obtained by operating upon the rj process with the 
operators L»L 2 in turn, is 

| | 2 a*( V ) = | BAco) | 2 \6 t (co) | 2 o*{ri) ( 12 ) 

i.e., the product of the individual spectral functions of the processes 
{L 2 (rj t )}, apart from the factor a 2 (rj). 

1*4. Rational spectral functions 

When dealing with a discrete process, it is convenient to make the 
variable transformation z = e io> . Thus, for a purely nondeterministic process 

F(co) = W'(co)=i Qs z s = A(z) (1) 

— 00 

say, and 

e --2 < 2 > 

C 

where c is the unit circle in the z plane, positively described. 

A class of processes of particular interest is that for which A (z) is rational 
in z. From the fact that A (z) is rational and is real on the unit circle, we 
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deduce by the Schwarz reflection principle that if it has a zero (pole) at 
a, then it also possesses a zero (pole) at or 1 . Hence A (z) may be written 


A(z) = K 


n (z~ a) (az — 1) 

n(z-a) 

2 

— TT 

B(z) 

n (*-/?) (;3z-i) A | 

n (z-p)\ 

2 li. 

A(z) 


(3) 


■where the polynomial B(z) may be chosen so that all its roots lie within 
or on the unit circle, and A (z) so that all its roots lie within the unit circle, 
(obviously A (z) cannot have a root on the unit circle, for this would cause 
W ( a) ) to have a singularity in (0, 2n), against hypothesis). For a real 
process, which is the only kind we consider, A (z) is an even function of co, 
i.e. A(e~ ia ) = A{e~ ia ), so that we may suppose the polynomials A (z), B(z) 
to have real coefficients. We may thus write, for | z \ = 1, 


A{z) = K 


BWBW 

A(z)A(z) 


k B(z)B(z~ 1 ) 

A(z)A(z~ 1 ) 


(4) 


Referring to section 1'2, we see that a possible choice of 6(co) is 

aw (5) 

which is readily seen to correspond to the operation 

x t + <* 1^-1 + ■•• + x a x l-a = Vt + KVt-l + • • • Krj t _ b . ( 6 ) 

Equation (6) illustrates the structure of a process having spectral function 
rational in z. This class embraces the two important schemes for a discrete 
stationary variate distinguished by Wold: the moving average scheme and 
the autoregressive scheme, given by setting b, a respectively equal to zero. 

These schemes are usually defined directly from the stochastic difference 
equation (6) with the rj ’ s defined as an independent sequence. The approach 
here [due to Doob, ref. 16] has been in the reverse direction, so as to 
speak, but is instructive insofar as it shows that any spectral function 
rational in z may be regarded as a consequence of an operation of type (6). 
Furthermore, so far as spectral and autocovariance theory is concerned, 
the variates rj t need only be assumed uncorrelated. 

Without doubt, the importance of the rational spectral function depends 
to a large extent upon its theoretical tractability, and the attractive sim- 
plicity of (6). In technological applications the restriction of rationality is a 
very natural one, however, since any electrical or mechanical filter con- 
structed entirely of “linear” components will perform an operation of type 
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(6) upon the input. (Actually, the quotient of input and output spectral 
functions will he rational in co , but if we take discrete equidistant observa- 
tions it will then appear to be rational in e iai .) 

1*5. The intrinsic variance 
We see from the decomposition 

Xt = yt + rj t + + hVt -2 + •" (1) 

that rj t is the random element which has entered the process in the time 
interval (£ — 1, t ), and is consequently that part of x t which cannot be 
deduced from a knowledge of x t _ l3 x t _ 2 • . • (which in turn implies a knowl- 
edge of rj t _ l9 rj t _ 2 . . .). It is appropriate, then, that v = a 2 (rj) be called the 
prediction variance or intrinsic variance of the process. 

Now, we saw from section 1*2 that the rj variates have variance e c \ 
where c 0 is the absolute term in the Fourier expansion of log W[(co). 
Thus 

2 71 

— f logr w (co) d co 
2 71.1 

v = e 0 (2) 

That is, the prediction variance is equal to the continuous geometric 
mean of W[ (co). This equality was proved by Kolmogoroff [refs. 31, 32] 
in 1941, although Szego had earlier obtained virtually the same result 
[refs. 45, 46] in a non-statistical paper. 

We shall define the normalised spectral function , G(co) or M(z), as 

G { co ) = M ( z )= 1 ^^- (3) 

V 

It is obvious that G(co) = |0(o>) | 2 , and furthermore that 

2 n 

j lo gG(a>) dco = 0. (4) 

o 

G(co) is that part of the spectral function which refers to the linear operation 
L [see eq. (1*2*12)] and so is of primary importance. However, when we 
come to the inductive side of the problem we shall find that v is of con- 
siderable importance too, since the touchstone of a hypothetical model 
is that it lead to as small a prediction variance as possible. 

1*6. Prediction 

Wold’s decomposition has the nature of an existence theorem, but as 
soon as the deterministic component and the coefficients of the moving 
average are specified it becomes a linear prediction formula . 
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The subject of prediction has been much studied under recent years, 
the two outstanding contributions being undoubtedly the parallel works 
of Kolmogoroef [refs. 31, 32, 33] and Wiener [ref. 60]. Wiener’s work is 
the less general, but the more readily applicable. The first problem he sets 
himself, and the only one we shall consider here, is the extrapolation of a 
stationary purely nondeterministic series. We shall record his results for the 
discrete case. 

If the extrapolation is one of a steps, then the criterion that the differ- 
ence between the forecast 

00 

Xt -fa “ ^ Xt— v (1) 

and reality, x t+a , have minimum variance over all realisations leads to the 
equation 

£r-fa = Qr-v (t = 0, 1, 2 . . .) (2) 

0 


which will be recognised as a limit form of the usual estimation equations 
tor the coefficients of an autoregressive scheme [see Wold’s equations (221), 
(312)]. 

An explicit solution is obtained for K v in the following manner. We 
recall the function 0(co) of section 1'2 for which 

W' (co) - e c ° 1 0 (co) | 2 = 1 1 + V“ ia> + & 2 <T 21w + ■ ■ | 2 . (3) 

Let us also define 

k(o)) = ZK v e- iv( ° (4) 

V 

so that 

2n 

K r = f e ira k(co)d(o. (v = 0, 1,2...) (5) 

2 n J 


Wiener shows then that 


z n 


( 6 ) 


which may alternatively be written 


k (co) = 


5-0 


2b,e 

5-0 


-isa) 


(?) 


The prediction error variance is e c ° 2 | K | 2 . 

o 
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The 6„ coefficients of (3) are in fact the coefficients of a moving average 
prediction formula [see eq. (1"2T2)]. In equations (6), (7) this moving 
average formula is transformed to an autoregressive prediction formula. 
The autoregressive form is obviously preferable, since the forecast is then 
expressed directly in terms of observed quantities. 

When a = 1 the prediction error variance is equal to e c °, in accordance 
with the result of the previous section. 

It should perhaps be remarked that the numerical calculation of formulae 
such as (1) gives merely an optimum graduation of the series, and cannot 
in itself give any new insight into the generation mechanism of the process. 
Nor, indeed, can any method of analysis which proceeds purely by rote. 


Chapter 2. Inference 
2*1. Inference in time series 

We shall now pass over to the inductive side of the problem: given an 
empirical series, what is the process which generated it? This is the question 
which is asked first and last; we investigate particular models only that 
we may better provide an answer. 

The subject of time series estimation and testing tends to be obscured by 
haze, and this must be due quite simply to the difficulty of evaluating the 
expectations and distributions which arise, since there is nothing in the 
actual inference problems which is special just for time series. We shall 
therefore begin with a discussion of the distribution of certain regularly 
encountered sample functions. We shall see that the estimation equations 
and test statistics are completely analogous to those occurring in the study 
of independent variates, even if they do not hold so exactly. It will be 
further remarked that the familiar least-squares criterion is well in evidence. 
The reason is partly that both spectral theory and least squares theory 
involve only the second moments of the observations, and therefore comple- 
ment one another very naturally. 

2*2. The relation between observed and residual moments 

There are very few statistics in time series whose distributions may be 
evaluated exactly, and approximations are the rule rather than the excep- 
tion. One of the stumbling blocks in the way of exact analysis is the “end 
effect” of a finite series, which must usually be neglected, the justification 
being that it is sensible only for a short series. 
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In this way, a number of useful results connecting the autocovariances 
of the observed and residual variates may be derived directly from the 
moving average representation of a purely nondeter minis tic series 

+ Ml- 1 + Mh + - (1) 

or the corresponding autoregressive representation (when it exists!) 

x t + + Q >2 x t -2 + *** “ 1]t (2) 

if we remember that 

G ( co ) = = 1 1 + + b i e~ ii< “ + ■ • • | 2 = 1 1 + ^ e - ilD + a s e' 2 '” + •• • | ” 2 . 

® (3) 

Thus, suppose that the observed series, x l9 x 2 . . . x n , has autocovariances 

C„ = 2 Xt xt +s (4) 

n — s i 

and periodogram 

/ (co) = - [(2 Xt sin co tf + (2 x t cos co t) 2 ] ** ^C s e i(OS (5) 

n -n 


while the corresponding quantities for the residual series rj l9 r \ 2 , . 
C? ) and / <>J) (co). We find then readily from (2) and (3) that 

/(w)«G(a>) / (??> H 

and 

C?* ** 2 C v+S 


where [G (co)] -1 =2 Ys ei< ° 3 - Equation (7) can be rewritten 

2,71 


Cf 


-f 

2 ji J 


m , 

GW 


! dm. 


rjn are 

(6) 

(7) 

( 8 ) 


We shall find equations (7) and (8) especially useful later for the case s =0, 
when they obviously give us the residual sum of squares of the observed 
series, expressed in terms of the correlogram and periodogram respectively. 

A condition that (7) and (8) hold is that [G(co)] -1 shall possess a Fourier 
expansion, which is also the condition that representation (1) he trans- 
formable to (2). This condition is seldom irksome in practice, although it 
means that we cannot deal with processes such as 


x t = Vt~ Vt - i 

for which £r(0) = 0, and for which (2) takes the nonconvergent form 

rj t = x t + x t _ r + x t _ 2 + ■ ■ • [see however ref. 61] 
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2-3. Expressions for the sampling moments 
Most of the sample functions of interest in time series analysis are quad- 
ratic functions of the observations, whose distribution properties depend 
to a large extent upon the corresponding matrix of the quadratic form. 

The results of the last section can be derived in a more sophisticated 
manner by matrix methods [see ref. 54]. For example, eq. (2‘2*7) can be 
written, for 5 = 0, 

*z ( 1 ) 

where x, rj are the observation and residual vectors, M(z) = G(co) [see 
section 1’5], and W is the circulant matrix 


W = 



( 2 ) 


In (1), C 8 has been approximated by the so-called “circular autocovariance” 


0 .- 


-x'W 3 x. 
n 


( 3 ) 


This modification of the usual definition was first introduced by Hotelling 
[see ref. 1], and is in effect an elegant method of neglecting the end effect. 
We shall apply it to calculate the cumulants of a linear function of the 
autocovariances, under the assumption that the residual variates are 
distributed normally. 

Consider the linear function of the autocovariances 

£-x'Q(W)x (4) 

where it is assumed that Q(e io> ) may be expanded in a symmetric Fourier 
series (symmetry can always be arranged, since x'W*x = x'W~ 8 x). Then, 
if E(XX') = V, the characteristic function of £ is the determinant 

0 (d) = \I- 2 iOrQ(W)\-i*\I-- 2 iOA(W)Q(W)\-t (5) 

since V &vM (W) = A(W), by (1). Now, a function of W, say A(TP), has in 

2 ni 

general latent roots h(e?), (j = 1, 2 . . . n), where e= e n , [see ref. 54]. Thus 
& (0) ^£1 [1 - 2 i B A (e>) Q (s')]"* (6) 

so that 
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ip(6) = log<P(0)fss! — i£log[l ~2idA(d)Q(e?)] (7) 

[l-2i0^L (z) Q (z)]^ (8) 

1 * 1-1 

if we approximate the sum by an integral. If Q(z) = ! P(ct)), then (8) may 
be written 

2n 

f (0) ~ ■— J log [1 - 2 i 6 F (co) P (co)] d co (9) 

0 

and, expanding this expression in 6, we obtain the following formula for 
the cumulants of £: 

k ~ 2 '- 1(? 2 ~ 1)! ” J [F (co) P (co )] 1 do. (10) 

0 

We shall find constant use for these two equations. Note that n enters 
only as a simple scale factor, i.e. 0(6) is approximately of the form [a(0)] n , 
and we have reduced the consideration of an autocorrelated series to that 
of an independent series. 

In the generalisation of (9), (10) to the multivariate case 9P(co) will be 
simply replaced by Z^P^co). 


2 * 4 , Miscellaneous results 

There are a number of miscellaneous results which should be mentioned, 
although they do not fit into any uniform treatment. 

Koopmans [ref. 34] proved eq. (2*3*9) for the special case P(a)) = l, 
P(co) = cos co, when investigating the distribution of the first autocorrelation 
coefficient of a random series. Continuing along this line it was shown 
[refs. 12, 43] that the frequency function of the coefficient is 

n-1 

f(r) const. (1 — r 2 ) 52 . (1) 


This result is valid for any of the earlier autocorrelation coefficients of a 
random series, since expression (2*3*8) is unaltered in value if z is replaced 
by z 9 , p integral. 

Madow [ref. 38] has extended (1) to the case where the variates follow 
a first order autoregression with neighbour correlation q : 


f (r) const. 


1 - r 2 

1 - 2 gr + g 



( 2 ) 


14-535697. H. Wold. 
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Quenouille [ref. 41] has generalised (2) to give the simidtaneous distribu- 
tion of r l9 r 2 . . . r k , when the autoregression is of order k. Analogous ex- 
pressions have not yet been obtained for any more general case, to the 
author’s knowledge. Of course, the frequency function for any particular 
statistic can in general be approximated by a type A series with the help 
of the expressions for the cumulants given by our equation (2’3*10), and 
there are other possibilities, all yet relatively untried (e.g., the type C 
representation [ref. 24] and the method of mixtures [ref. 42]). 

A pair of interesting relations are Bartlett’s formulae [ref. 2] for the 
covariances of the empirical autocovariances and autocorrelation coefficients 

v 2 

cov (C Sf Ct ) ** — [X 8 +t + X 8 -t ] + K (3) 

cov (r 8y r t ) ** - [ X 8+ t + X s - t + 2Q 8 gtX 0 —2gtX 8 — 2g 8 Xt\ (4) 

Here K is a function of the fourth cumulant of the residual variate rj , 
g 8 — E{xtXt +8 )jE(x t 2 ), and X a — 2 Qv Qv+s- If we set j = 2 in formula (2‘3*10) 
we clearly obtain a relation equivalent to (3) for the normal case, when 
K = 0. Formula (4) is interesting in that it does not involve any of the 
moments of the residual variate. We can very simply show that this property 
is a general one for the ratios of quadratic forms occurring in the analysis 
of stationary series. 

Consider the quadratic forms in the residuals 
a = rj'Ar] b = rj'Bri. 

If rj t has cumulants, 0, k& k s , . . ., then we readily find that 

d = E(a) = k 2 ti A 

cov ( Q/y 6) “ *4* 2k z tr AB. 

Here we have assumed that the matrices A, B are symmetric and have 
elements a ijt b ij} so that tr A = 'Za ii . 

Consider now the ratio r = a/b. We find 

(1+0 (»-*)) = (1 + 0 («"*)) (5) 

and 

vax (r) = yar (a) + var ( b ) - 2 cov (a, 6)j (1 + 0 (re~*)) = 

= (l + 0(w-*)) p* [6 2 S aS+a 2 S 6?, - 2 a 5 S o«6«] + ] 

5 4 [ + 2 [5 a tr A 2 +d z ti B* — 2 & B tv A B]\ " 


( 6 ) 
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Now, the coefficient of in (6) can vanish under a variety of conditions, 
and in particular it will vanish when a H and b u are independent of i. This 
will approximately be the case for all quadratic forms occurring in the 
analysis of a stationary time series, since these will have matrices which 
are approximately Laurent matrices [see equation (2'3’1) for example] so 
that the elements of the principal diagonal are approximately constant. 
That is, all quotients which we shall expect to have to deal with have 
means and variances [and covariances, by the same proof] which are 
asymptotically independent of the residual variate's distribution function. 
This is a result of some importance, since our knowledge of the rj distribu- 
tion in any practical case is usually of the scantiest. 

2*5. Estimation , nondeterministic case 

We shall adopt the least square estimation criterion, so that the estimating 
relation is obtained by minimising the residual sum of squares as given by 
(2’2‘8). That is, if G(co) and v have least squares estimates Q(co) and v, then 

1 f /(co) , . 

^ 757—7 dco = min. = v. (1) 

2 7zJ G(a)) 

0 

The minimisation is performed with respect to the unknown parameters of 
6r(a>), which we shall denote by 0 l9 d 2 . . Q P . However, the minimisation (1) 
is not a free one, since G(co) is conditioned by equation (1'5*4). This is an 
equation which holds identically for all modifications of G(co), such as 
parameter variation, so that it will also hold for G(co): 

2 71 

j \og<3(co)d(o = 0. (2) 

0 

Combining (1) and (2) we obtain the fundamental estimation equation, 

2 jt 

2l,j + <3) 

0 

where a -1 is a Lagrangian multiplier. 

If the minimisation (3) is carried out freely with respect to the function 
G(co), then the calculus of variations gives 


( 4 ) 
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or G(co) = a/(a>), a result which should scarcely surprise us. In general, 
however, minimisation may take place only with respect to a limited number 
of parameters. 

For theoretical calculations it is convenient to work in terms of the 
periodogram, as we have done in equations (1 )-(4), but for practical 
estimation a far preferable method is to use the correlogram [see equation 
(2‘2‘7)] so that the estimation equation is rather 

2 y 3 C a = min. =v (5) 

where (co)]"" 1 — 2y s z s . The advantage is that we work in terms of a sum 
rather than an integral, and although the sum is theoretically an infinite 
one, the coefficients converge so quickly that it is usually sufficient to 
consider between 5 and 20 terms. Further, there is generally no difficulty 
in incorporating condition (2) at an early stage of the argument, so that 
the resultant minimisation with respect to parameters may be carried out 
freely. 

Suppose, for instance, that F (co) is rational in z, so that 





(6) 

Since 

2 7j 




J log (z — a) (z~ x — a) dco = 0 

Q 

(1 °° 1 < i) 

(?) 

we find 







(8) 


so that y 8 is the coefficient of z s in the Laurent expansion of 

n(z-q) (s’ 1 -a) 

Suppose that we consider, as example, a second order autoregression 

® t l a 2%i-2 = Vt' (10) 

Then, by (5) and (9), we must minimise 

(1 4- a? 4- ftl) Cq + 2 (% 4- cL x a 2 ) C 1 4- 2a 2 C 2 (11) 

and so obtain readily the equations 

Ci 4 - %( 7 q + d 2 C x =0 
C 2 + d 1 C 1 4- d 2 C 0 =0 


( 12 ) 
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which, on comparison with Wold’s equation (312), prove to be the classical 
estimation equations for the autoregressive scheme, as used by Yule. 
However, it is only for the autoregressive scheme that the least square 
estimates coincide with those that have been customary. Consider, for 
example, the first order moving average 

x t = rj t -bri t _ x (|&]<1). (13) 

We find from (5), (9) that the expression to be minimised is 
1 


1 - 6 5 


[C 0 + 2 6 C x + 2 6 2 C 2 + 2 6 3 C 3 H — ]. 


(14) 


The estimate of b thus obtained is a function of the entire correlogram, 
although nearly all the weight falls on the earlier coefficients. It may 
easily have a variance only one quarter of that of the estimate obtained 
from C 0 and O x [see ref. 58]. 

2*6. Variance of estimates 

In the case of independent variates least square estimates are usually 
consistent, and have certain optimum properties. Wald [ref. 52] has given 
a general set of sufficient conditions for the consistency and asymptotic 
optimality of maximum likelihood estimates which, when specialised to 
the present case (we assume normality for the moment), become that 


( 1 ) 


d j r i 

Fff O (co)_ 


exists and has a Fourier expansion (j = 1, 2, 3) in some Q interval 
around the true value and 


( 2 ) 


j© 


da)=¥ 0 . 


To these must be added the previous conditions that 0 and G ~ 1 possess 
Fourier expansions. These conditions will in general cover also the case 
of non-normality although in this more general case the least-squares 
estimate is only asymptotically optimum among those estimates derived 
from linear relations in the autocovariances. 

d U £)2 jj 

Let us now denote the residual sum of squares by £7, by £7 X , — ;r 2 - by 

o U c U 

Z7 n , and the corresponding quantities for which 0 assume its estimated 
value 0, by V, t) n . Wald’s conditions are sufficient to permit the 
expansion of in a Taylor series 

#!-<> = U 1 + (Q-0)U 11 +0 (n- 1 ) 


( 1 ) 
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so that 

and 
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d-e~ - 


u% 

u„ 


var Q = E(d-6f = E {-^±\ - RilUl 
\UJ [E(U n )f 

Now, by equation (2'3'10) and identity (m) „ ^ 

2n 

2n 

0 in temZ al^cSd fuStiln T “* “ I ’ mp ‘ 0tio ™"“ 


( 2 ) 

(3) 

(4) 

(5) 


var Q = ■ 


rj( f) 


dco 


(6) 


T e in tte , same ]1 way ^ ref - w 

01 °i> °2 • • • Op is asymptotically 


2n 

[JL f / gjogg \ /aiogg\ n-i 

L 4 ^J U«i M 00 * /*"] ■ 


(7) 


(h Jc=l, 2 . . . p) 

I ^totia'ei’i,r ’ m °T Uted *■ >“• ™i*»« 2.-/.. 

gression ot , Moving .4^ » tta""' 1 P ~ “ ei “‘ eI “ 

say. Substitution in (7) then gives 


271 


gi (/— k) CD 

ThT 


s cZ co 


( 9 ) 


ppend. 2] 


SOME RECENT CONTRIBUTIONS 


215 


ldependently of whether the power in (8) is positive or negative, which 
idicates an interesting symmetry between the two schemes. 

Formula (7) holds generally, independently of the residual variate’s 
istribution, since the estimates 6 are functions of ratios of quadratic 
unctions, (easiest seen by noting that the estimates are unchanged if the 
stimation equation (2*5*5), is divided through by C 0 ). 

It may be shown that the least square estimates are asymptotically 
optimum in the sense that the total variance, \A |, is a minimum, if we 
cnfine ourselves to those estimates derived from second order functions 
>f the observations. In the case of normally distributed residuals the 
ninimum is an absolute one. 

!• 7. Tests of fit 

We shall now turn our attention to the test problem, the problem of 
leciding between a number of more or less distinct hypotheses. TheNEYMAN- 
Pearson test theory shows that, on a certain criterion, the statistic which 
iiscriminates most efficiently between two specified hypotheses H l7 H 2 , is 
she likelihood ratio. If the values of any of the parameters are unspecified 
bhen the appropriate statistic is the ratio of maximum likelihoods [refs. 
35, 51, 54], at least asymptotically. This is equivalent to the ratio of mini- 
mised residual variances 

A = ^- 2 (1) 

Vi 

if the variates are normally distributed. In practice we have generally no 
idea of the variate distribution, but use A anyway, on the grounds that 

(a) least-square statistics are relatively easy to calculate 
and (b) have a relatively simple distribution theory, 

(c) few residual variates have a distribution which is radically non- 
normal, 

and (d) even A has certain limited optimum properties. 

Despite statement (b) the calculation of the X distribution is in general 
far from easy. We note one important class of exceptions, however, those 
cases for which hypothesis H 2 includes H x . (Thus, for example, the hypo- 
thesis of a third order moving average includes those of a first or second 
order moving average, the hypothesis of stationarity includes that of 
periodicity.) Of course, when H 1 is merely a special case of H 2 we can no 
longer speak of discrimination, since the hypotheses are no longer mutually 
exclusive, as is tacitly assumed in most test theory. The statistic A tests 
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whether the extra parameters of H 2 allow a significant improvement in 
fit, or, if we like, tests the fit of H 1 relative to H % . 

Suppose that hypothesis H x entails p undetermined parameters, while H% 
involves an extra q parameters, and so has p 4- q altogether. We shall now 
briefly prove that 

= (w - p - g) log x 3, (2) 

^2 

is, on hypothesis H l9 asymptotically distributed as with q degrees of 
freedom [see refs. 55, 57, 53]. 

In equations (2*6*4), (2*6*5) we established the relation 

E(Ul) = 2vE(U n ) (3) 

For the multiparameter case we can prove in exactly the same way 

E(U 1 U[)=2vE(U 11 ) (4) 

E(UU U ) =(n + 2 )vE(U n ) = *(n + 2)E(U 1 Vi) (5) 

where U x and V u are now respectively the vector and matrix of first and 
second differential coefficients of U w.r.t. 0 l5 0 2 . . . 6 V . We shall denote 
the vectors of parameters and parameter estimates by 9, 6 respectively. 

3 

Now, expanding log tJ and — log 0 in a Taylor series, we have 

86 


where the terms will decrease regularly in powers of w*, since 6—0 =0(w _1 ). 
Now, hy (7) 

6-8~-(u n -^yU 1 ( 8 ) 


and, substituting this approximate solution in (6), we have 

log U - log tf ~ i Ui ( U Z7 n - U x U[ )-i U x . (9) 

Equation (9) gives an approximate expression for the relative reduction in 
residual sum of squares due to fitting. Now, making use of the fact that 
| E(U U ) | t^O, and of equations (4), (5), we find that 
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log 4 Ui[E (U U a - V l C/i)]- 1 [E ( V \ V 0]" 1 U x (10) 

so that n log ^ would be asymptotically distributed as % 2 with p d.f., if 

only U x were asymptotically normally distributed. This will in general be 
the case, however, as we see by an application of the Bernstein extension 
of the Central Limit Theorem [ref. 7], 

Suppose now that for the two hypotheses H v H 2 the min. residual sum 

of squares ft has values ft ^ ft p + Q respectively. Then log Jr-, log are 

Up t/p+j 

on hypothesis H 1 asymptotically distributed as with p, p 4- q d.f. respec- 
tively, p d.f. being common. Thus, by the partition theorem for % 2 variates 


, u ,17,0, 

n log ^ u log zzr — n log ^ — 

Up + q Up U p+q 


( 11 ) 


is asymptotically distributed as % 2 with q d.f. While not directly indicated 
in this derivation, the term n has been modified to n - p - q in (2), to allow 
for “lost degrees of freedom” [cf. ref. 53]. 

The practical form of the test is, then, that a significantly high value 
of y> 2 indicates bad fit, while a value in the neighbourhood of expectation 
indicates good fit, at least if we restrict ourselves to the alternatives per- 
mitted by H 2 . 

The simplest example of a \p 2 statistic is that obtained on a comparison 
of two autoregressive schemes of different orders. For an autoregressive 
scheme of order p we find that 


where 



C Q Cy 

. C p 

D P = 

c x c, .. 

• Cp-1 


C„ Cp-1 . . 

• Co 


( 12 ) 


(13) 


so that for autoregressive schemes of order p, p 4- q 

^ _ S 2 D p \ g Dp -l 

Si Dp -Hj - 1 Dp 


(14) 


[see ref. 12] and 


- (n — p — q) log A 


(15) 
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may, for sufficiently large n, be tested as a % 2 variate with q d.f. For other 
schemes than the autoregressive explicit expressions for the variance 
estimates do not exist [see eq. (2*514)], and these must be obtained by an 
iterative minimisation. 

2 * 8 . Tests of fit with constant counterhypothesis 

The % 2 test of the previous section measures the fit of H 1 relative to the 
alternatives permitted by H 2 , the counterhypothesis . Now, in general the 
most desirable class of alternatives will be a very wide one — the class of 
all stationary processes, for instance, or the class of all stationary and 
purely nondeterministic processes. However, if H 2 is to be so comprehensive 
we see that we shall have some difficulty in calculating the corresponding 
variance estimate, v, since a hypothesis of the generality of those we have 
mentioned entails an infinite number of parameters. For example, if H 2 
is the hypothesis that the process {ccj is stationary and purely nondeter- 
ministic, then 

x t = Vt "b \Vt~i + ^2Vt-2 + *** W 

and we may regard (b l9 b 2 . . .) as being distinct parameters of this general 
hypothesis. It is obvious that we cannot estimate all these parameters from 
a finite sample. 

However, in this case we note that the sequence b l9 b 2 . . . must tend 
to zero, so that it is actually only the earlier coefficients which are of 
importance. Thus, it should in practice be possible to obtain a sufficient 
approximation to the actual process by estimating only a number of the 
earlier coefficients in (1), i.e., by graduating the series with a high order 
moving average. 

Suppose that the graduation is of order &, and that the resultant least- 
squares estimate of the residual variance is v 0 . Suppose further that the 
fitting of a specified p-parameter hypothesis H leads to a variance estimate 
£. Then, as before, for sufficiently large n and h 

ip z =(n-k) log (2) 

is asymptotically distributed as y 2 with k-p d.f. The advantage of (2) is 
that we now have a constant counterhypothesis, which is fairly sure to 
include all likely null hypotheses, and against which the fit of all such 
hypotheses may be tested. 

This derivation of the test is, of course, an extremely superficial one. 
Caution is necessary in the application of (2), the most sensitive point being 
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the choice of the graduation order, k. Tor a fuller description and commentary 
see the original paper [ref. 57]. 

In practice, an autoregressive graduation is to be preferred to any other, 
as v 0 may then he expressed directly in terms of sample functions [see 
eq. (2*8’12)] without the need to first explicitly solve for the fitted “para- 
meters.” Of course, if the process is to be representable as an autoregression, 
then [J(co)]” 1 must be expandable in a Fourier series, which is a restriction, 
but a minor one. 


2 * 9 . Discriminatory tests 


For the general discriminatory test, the hypotheses H 1 and H 2 correspond- 
ing to the statistic 

a) 

% 


bear no particular relation to one another. This increases the difficulty of 
coping with the unknown parameter values, and makes the whole treatment 
more approximate, in that we can only take account of differences in 
Sj and v 2 of order 0(1), whereas in the previous two sections we considered 
differences of order 0(w -1 ). 

Suppose that the minimised residual sum of squares for a hypothesis H is 
tl = nv. Then, since 0 is a minimum w.r.t. 6 l9 d 2 ..0 P ,C has a distribution 
which is asymptotically independent of the parameter estimates 0 2 . . . 6 V , 
whether H is the correct hypothesis or no [see 2*6]. Thus, tJ is asymptotically 
distributed as though the equality held 


tJ 


2n 

=— f L 

2 tz J G 


f H 
M 


( 2 ) 


where G(a>) is the least-squares estimate of G(co) ojhen f(co) assumes its 
expectation value. That is, C is asymptotically distributed as a certain linear 
function of the sample autocovariances. Thus, if H (co) is the actual spectral 
function, then by (2*3*8) the r th cumulant of t) is 


hr 


2 r 1 (r — 1 ) ! n 
2 71 



(3) 


Similarly, the joint cumulants of and fJ 2 are given by 


2 rt 


1 (r -f s — 1 ) ! n 



2tz 


(4) 
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Equation (4) gives us, at least in theory, a method of calculating the A 
distribution, by using a bivariate type A series. This is not a very practical 
procedure, however, and we shall obtain a simpler one, at the price of 
another approximation. 

Suppose that G 2 (cd) corresponds to the null hypothesis, so that H(co)f& 
const. an( l suppose that n is so large that and # 2 may be considered 
as normally distributed. Then we find, with the help of a formula due to 
Geary [ref. 20] that 


Vl -2juA + vA 2 



is asymptotically normally distributed with zero mean and unit variance. 
The quantities ju and v are given by 



and so depend upon H(co) and only upon H(co), whose parameters must in 
practice be chosen in the neighbourhood of their estimated values. 


2 * 10 . Harmonic components. The periodogram 

We have hitherto considered the estimation and testing only of a purely 
nondeterministic scheme, so the inclusion of a deterministic component is 
required to complete the treatment. In the main we shall restrict our 
attention to those deterministic components consisting of a finite number of 
sinusoidal terms, 

Q 

¥>*“2 sin <o v t + B v cos co v t) (1) 


as being almost the only case of practical interest. 

In this section we shall assume that the nondeterministic component is 
random, i.e., that x t =yjt + rj t . Then we readily find that the least-squares 
estimates of A Vi B v and a> v are approximately given by [see the next section 
for more detail] 


A v = -'Zx t sin co v t 
n 


cos d) v t 
n 


t (top ) = ” [(2 sin d) v if + cos co v t ) 2 ] = max. 


(2) 


append. 2] 


SOME RECENT CONTRIBUTIONS 


221 


jo that the least square estimates of the co v ’s are located at the peaks of 
;he periodogram /(co), and the magnitudes of these peaks provide the least- 
square estimates of the corresponding amplitudes, Al + Bl. It may be 
shown that these estimates have least variance of any yielded by second 
order functions of the observations, despite the exceptional character of 
bheco v estimate [see 2*12], 

We see thus that the Schuster periodogram analysis, of late years 
regarded with so much uncertainty, is actually the appropriate technique 
for the location and estimation of periodic components. That it should 
ever have fallen into disfavour is a result of mistaken (although extremely 
understandable!) attempts to use it for purposes for which it was notin- 
tended [see 2’ 12]. It is notable that the periodogram, which with a 55 year 
history behind it may fairly be claimed to be the time series analyst’s 
oldest weapon, has survived unchanged and shown itself scarcely capable 
of improvement. The same is true of the classical periodogram test, which 
we shall now describe, derived by Fisher (1929). 

If the number of observations is odd and 



then Fisher takes the greatest of the ordinates y,j, (let it be y a ) and shows 
that the probability of obtaining a value of 

m 

g=y a llyi ( 4 ) 

1 

greater than that observed is 

(5) 

where the summation is continued over all terms in which (1 — j g) is positive. 
The assumption is that the observations are distributed normally and 
independently about zero. 

We see how radically the test differs from the usual type: if it were so 
that one “degree of freedom” were lost for each of the three estimated 
parameters A, B and a>, then g would have an expected value 3 /n, whereas 

2 Vb 

a calculation shows the expectation to be of order - log - (consider the 

extreme values of the rectangularly distributed variate exp [- y\ 
However, as soon as a genuine harmonic component appears the minimisa- 
tion w.r.t. co “stabilises”, and we shall see later that the relative reduction 
in residual sum of squares due to fitting is in this case really of the order 3/n. 
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At it stands, Fisher’s test is limited in three respects: 

(a) Only the largest peak may be tested. 

(b) The only null hypothesis is that of independence. 

(c) The only ordinates considered axe the equidistant ones (3). 


The first two restrictions are easily remedied. First, suppose that we 
wish to test the second largest ordinate, yt . This will be the case only when 
y a has been found significant, so that we must ipso facto modify our 
hypothesis to include a harmonic term with an co value in the neighbourhood 
of ya s. However, the distribution of the remaining ordinates will be little 
affected by this, and 


Q = 


Vb 


I Vj-Va 


( 6 ) 


will have approximately the distribution (5) with m replaced by m — 1. 
Similarly for the third greatest, etc. 

Suppose now that we wish to adopt the null hypothesis that the scheme 
has a known continuous spectrum, F(co). This has the approximate effect 
of altering /(m)’s scale factor by F(co) 3 [see (2'2'6)], so that if we define 


< 7 > 

then g will have the same definition and almost the same distribution as 
before. 

The third limitation is not so easily removed, but is on the other hand 
less serious. If a harmonic term falls midway between two ordinates of the 
grid (3) then its amplitude will be reduced by roughly 4/?r 2 j^O. 41 at these 
ordinates, which may or may not be enough to make it appear nonsignif- 
icant. Some headway may be made by considering integrated values of the 
periodogram over an co interval, see 2*12. 


2 * 11 . The composite case 

Let us now consider a sample from the general stationary process (with 
the single restriction that the purely nondeterministic component of the 
spectrum be nonzero in (0, 2 7t)). Since the deterministic and nondeterministic 
components are additive, the residual sum of squares must be expressible as 
in 2*2, except that x t will be replaced by x t —y)t. Introducing this modifica- 
tion in eq. (2*2*8) we find that 

2 n 

rr _ 1 f nf (co) — 2 A (co) X ( - co) + A (co) A ( — co) , 

u ~ o I 7T , — ; o co 

2 31 J @(( 0 ) 


(1) 
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where 

X(co) - 1 x t e i(ot A(co) = I y>* e ia) *. (2) 

Suppose that 6r(co) and A(co) involve parameters 0 l9 0 2 . . . 0^, whose least- 
square estimates 6 ± , 0 2 . . . 0 P are obtained by minimising V of (1). Then, 

A A A 

using the same methods as in 2'6, we find the covariance matrix of 0 l9 0 2 . . . 0® 
to be asymptotically 

2 n 2 n 

, r» f/81o»e W\ !» log o <o>)\ , 1 f 1 "’ (o>) A K to,) , 1 

J - s J ^ + ^ J - gw H 

0 0 

(3) 

where 

(4) 

Of course, formula (3) and its derivation break down if A~ x should be 
singular , so that A does not exist. Such would be the case, for example, if 

3 

we should fit a harmonic component where none existed (for then the - — 

do v 

column in J -1 , which is proportional to the amplitude of the component, 
would be zero). 

The tests of fit of 2'7-2*8 may be extended to the present case, again 
under the restriction that A exist for all hypotheses considered. 

The case of greatest interest is again that for which 

Q 

V (0 — 2 (A v sin o v t + B v cos co v t ) . (5) 

v — 1 

Substituting (5) in (1), and using the relations 

2 sin 2 cot&'L cos 2 cot& - 
2 sin cot cos ait&O 



we find, after some reduction, that 



^ (Ap dp 4 ” Bp bp) 

v G(co v ) 


n r {A % v + Bl) 

2 „ G((o v ) 


( 7 ) 
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where 

a v = sin cot b v = T,x t cos cot. (8) 

The estimation equations for A v , S v and co v are found from (7) to be 



( 9 ) 


so that the harmonic components are chosen from the greatest peaks in 
f(co)/G(co) 9 although the amplitudes are estimated directly from /(co). 

The complete system of estimation equations is 


2 civ 2b v f (co v ) 

—* Jj — A) . a . — max. 

n n G (co v ) 

(v-l, 2 ,...g) 

2n 


n f 9 v /(A) . 

2nj 6{w) d 2 ?(5(w v ) 

0 

(10) 

2 jt 


j log G(co)dco = 0, 



o 


the maximum and minimum being taken with respect to a) v and (0 l9 
$2 • • * 0J,) respectively. Applying (3), or working direct from (7), we find 
that A ^ Bp, cl),* are uncorrelated with A V} £ v , co v {{jlt^v) 9 or with the para- 
meter estimates of Cr (a>), 0 ls 0 2 . . . d v . These latter have the same covariance 
matrix as before [eq. (2*6*7)], while the covariance matrix of A v , S V9 6) v is 


2 vG(co v ) 
n 



0 

1 

nA v 

~2~ 


— nB v 
2 


-l 


nA v 

~Y 


— (Al + Bl) 


( 11 ) 


The least-squares estimate of a constant mean is given approximately by 
£=-2^. This is uncorrelated with any of the other parameters and has 

k\j 

variance / vG(o)/n . Regarding the practical solution of the equation system 
(10), see ref. 56. 
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-•12. Miscellaneous 'periodogram results 

One characteristic of the periodogram is almost too well-known to need 
nention, namely, that while it gives an estimate of the spectral intensity 
F(co) at a continuous part of the spectrum, this estimate is inconsistent. 
This may be quickest seen in the following manner: 

1 i n n 

f(co)= -[(Z$ t sin cot) 2 4- (Ecc* cos <u«) 2 ]= - ^^x t x t co8co(s — t) 
n n i x 

a) 

-n 

so that 

£[/(«>)]= I (2) 

var [/(o,)]-[(?(o)] 2 var [/ w («)] = 2 [(?(«)] V. (3) 

Thus, while E [/(co)] is equal to F(co), apart from a small bias vanishing 
with increasing n, var [/(co)] = 0 (1), so that the coefficient of variation 
does not vanish as n-+o o. This means in practice that the periodogram 
presents a wildly irregular appearance, suggesting little or nothing to the eye. 

That a harmonic term (corresponding to an infinite F (a>) ordinate) can be 
estimated consistently depends upon the fact that the amplitude of such 
a harmonic is estimated by (4/%) /(co), and has coefficient of variation 
0 (tT*). 

The root of the trouble is the fact that the periodogram is an empirical 
frequency function — it describes the distribution of the “energy” of the 
observed fluct ua tions among the different frequencies of oscillation (from 
which follows, incidentally, that it is incorrect to set up a periodogram 
graph with the period of oscillation as abscissa, as is sometimes done, 
without sc aling the ordinates by the Jacobian of the transformation from 
frequency to period). The phenomenon of inconsistency is one characteristic 
of all empirical frequency functions of a continuous variate, namely, that 
while the total frequency in any interval converges to a more or less deter- 
minate quantity with increasing sample size, the point density of observa- 
tions never does so, unless the point should be one of i nfini te probability 
density. 

We have seen that in the least square approach to the inference problem, 
it has been necessary to consider the periodogram as such only when 
estimating harmonic terms, otherwise we have worked with periodogram 
averages (in actual fact, what is the same thing, linear functions of the 
earlier autocovariances). However, in many applications it would be useful 

1 5 -^ 3 * 697 . H. Wold . 
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to be able to form a direct estimate of tbe spectrum, and various methods 
of smoothing the periodogram have been suggested in order to bring this 
about. 

Bartlett [ref. 3] has proposed the estimate provided by the Fourier 
transform of the truncated correlogram 

Mco) = 2<V" s (4) 


[cf. (1)] where r has usually a value between 15 and 30, depending upon 
circumstances. This estimate is justified by the fact that the latter part of 
the correlogram of a purely nondeterministic series contains more random 
variation than useful information (which, is why the earlier coefficients are 
always heaviest weighted in the least-square estimation equations). Alter- 
natively, (4) may be regarded as being derived as the average of the n/r 
periodograms calculated for n/r series of length r. By eq. (2*3*10) 





(7 — 1 )! 

2 7t 




( 5 ) 


so that the variance is 0 (r/ri). 

Daniell [ref. 11] has suggested smoothing the periodogram by a moving 
average, so that if we average over co±h 


/n (co) — 


Jos sin jsh) 
r n * sh 


( 6 ) 


As h increases the variance of the estimate decreases, but so also does 
the frequency resolvability. This provides an illustration of the uncertainty 
principle enunciated by Grenander [ref. 22] stating that the product of 
errors in the estimates of amplitude and frequency has a certain lower bound. 

It is perhaps unnecessary to add that both Bartlett’s and Daniell’s 
modifications are designed to improve the direct estimation of a continuous 
spectrum; they will both lead to serious underestimation of the amplitude 
of any harmonic component. 

The integrated periodogram is an empirical distribution function, and so 
will give consistent estimates of the spectrum. Thus, by eq. (2*3*10) the 
cumulants of 
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are asymptotically given by 

h- 211 ^ ~ 1)1 n \ { F {o^)1 do>. (8) 

From this formula we can calculate the cumulants of f D (a>), if only h 
is not too small. 

We shall conclude this section by calling attention to a peculiarity of co, 
the least-squares estimate of the frequency of a harmonic component. 
The majority of parameter estimates have variance 0 (n -1 ), but a> has vari- 
ance 0 (n~ z ) or 0 (1), depending upon whether the component actually 
exists or no. For if there is in reality no harmonic term, then it is clear that 
the periodogram can have a peak anywhere, while if the component exists, 
equation (11) of the previous section shows that co has variance 0 (w“ 8 ). 

2 * 13 . Multiple series 

Practical problems far more often require the analysis of several simul- 
taneous and related series than of a single one. All of the preceding least- 
squares theory may be readily generalised to this case, although the multi- 
plicity does introduce new features (e.g., “identifiability”). In this section 
we shall very briefly consider the least square treatment of a purely 
nondeterministic multiple series. 

Suppose that we have r variates, x t = (®u, x 2 * . . . Xrt) which are linearly 
dependent upon r mutually independent series of residual variates, 


€ t “ {Sit, &2t • • • &rt), 

so that 

x t = B{T)e t (1) 

Here B(T) is an r x r matrix whose elements are functions of the operator 

T, defined by 

Tu t « (2) 

Only positive powers of T are assumed to occur, i.e., x t is a function of 
past values of e t alone. If | B(z) | has no zeroes on or within the unit circle, 
then there is also an autoregressive representation of the multiple process: 

e t = [B(T)]-'x t = A(T)x t (3) 

say. 

We shall now define the theoretical and empirical autocovariances and 
spectral functions : 
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Q jk (s) =E[x j ,t+ s Xkt] 

F(co)=(F lk (co)) 

w 


Fj k (co) = ( s ) e i<oS 

— OO 

71-3 

Ojk (s)= — 2 &f, s+t 

n t~i 

(5) 



-71 

(6) 

We readily find by direct methods from (1) and (3) that 



F{<o) = B{e ia )B'(e~ ia ) 

= [A' {e- ia )A(e ta )]- 1 

(7) 


if we assume that the B coefficients have been chosen so that the residuals 
have unit variance. The least square estimate of the matrix of spectral 
functions F(co) is yielded by the relation 


27V 2?v 

I log | F\ dco 4- — I tr [fF~ l ]dco = min. 
2tz J 2n J 

0 0 

and the resulting parameter estimates 0 l9 d 2 , • • • 6 V have 
covariance matrix 



(j,1c=l 9 2 . 


( 8 ) 

asymptotic 

.P) (9) 


if the variates are normally distributed. A feature of the multivariate case 
is that (9) may not always be extended to the case of non-normal variates. 

As before, if hypotheses H x and H 2 involve respectively p and p 4- q 
parameters, and H 2 includes H l9 then 



( 10 ) 


is asymptotically distributed as % 2 with q degrees of freedom. Here 


2tz 


^ J log I ^ | 


dco 


= *> 0 


v, = e 


O' =1.2) 


(ID 


where F l9 F 2 are two least square estimates of F(co). 


The following supplementary references may be given for topics not 
covered here: for the more recent theory of stationary processes, refs. 13-17 
for treatments of inference problems in the continuous case, refs. 2, 21, 
for discrete series with discontinuous variates, ref. 18, and for a test of fit 
for such series, ref. 4. 
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